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.  1  Introduction 

Our  goal  is  to  develop  a  fully  automated  classification  scheme  for  computer-aided  diag¬ 
nosis  (CAD)  in  mammography.  Traditional  CAD  classification  schemes,  and  performance 
measurement  tools  such  as  receiver  operating  characteristic  (ROC)  analysis,  are  based  on 
the  premise  that  the  observations  are  classified  into  two  groups,  most  commonly  malig¬ 
nant  and  benign.  Such  classification  schemes  are  difficult  to  fully  automate,  as  they  analyze 
radiologist-identified  lesions;  this  is  because  many  false-positive  (FP)  detections  produced  by 
a  computerized  detection  scheme  cannot  reasonably  be  classified  as  benign  or  malignant.  Our 
proposed  scheme  would  classify  computer  detections  into  three  groups:  malignant  lesions, 
benign  lesions,  and  FP  computer  detections.  This  method  presents  considerable  difficulties 
in  terms  of  both  signal  detection  theory  and  performance  evaluation  methods  such  as  ROC 
analysis.  Our  efforts  in  this  direction  have  thus  been  more  theoretical  than  practical  so  far, 
but  our  initial  results  are  promising. 


2  Body 

A  wide  variety  of  medical  decision-making  tasks,  in  particular  tasks  for  which  CAD  has  been 
proposed  as  an  aid  to  the  physician,  can  be  formulated  as  “two-group  classification”  tasks. 
That  is,  the  physician  must  use  the  information  available  about  a  patient  (e.  <?.,  a  set  of 
mammographic  films  of  the  patient,  and  the  result  of  computer  analysis  of  those  images)  to 
decide  whether  a  patient  belongs  to  a  diseased,  or  abnormal,  group  or  not  (e.  </.,  whether  a 
breast  lesion  suspicious  enough  to  warrant  further  imaging  procedures  or  biopsy  is  present 
or  not). 

ROC  analysis  has  long  been  considered  the  most  appropriate  methodology  for  evaluating 
the  performance  of  a  two-group  classifier  or  observer  [1],  particularly  for  medical  decision¬ 
making  tasks  [2],  Furthermore,  the  optimal  or  “ideal”  observer  —  that  observer  which 
achieves  the  best  possible  performance  given  a  particular  population  of  observational  data 

has  also  been  well  understood  for  quite  some  time  [3].  In  practice,  the  ideal  observer 
requires  knowledge  of  the  probability  density  functions  (PDFs)  from  which  the  observational 
data  are  drawn,  and  thus  cannot  be  achieved  in  non-trivial  tasks  by  human  or  automated 
observers.  Nevertheless,  successful  methods  for  estimating  ideal  observer  decision  variables 
from  a  sample  of  observational  data  [4],  and  for  plotting  an  ideal  observer  ROC  curve  from 
a  sample  of  decision  variable  data  [5],  have  been  developed. 

Although  the  form  of  the  three-group  ideal  observer  has  also  been  known  for  some  time  [3] , 
the  development  of  a  practical  three-group  classifier  and  a  fully  general  extension  of  ROC 
analysis  to  three-group  classification  has  proven  quite  difficult,  primarily  due  to  the  tremen¬ 
dous  increase  in  complexity  encountered  when  one  moves  from  two-group  to  three-group  clas¬ 
sification  tasks.  Briefly,  characterizing  the  performance  of  a  three-group  classifier  requires  an 
ROC  “hypersurface”  with  five  degrees  of  freedom  in  a  six-dimensional  ROC  space  [6, 7]  (by 
contrast,  a  two-group  classifier  is  fully  described  by  a  simple  curve  in  a  two-dimensional  ROC 
space).  Despite  these  difficulties,  our  research  efforts  are  focused  on  the  development  of  a 
three-group  classifier  and  performance  evaluation  methodology  for  breast  lesion  classification 
in  a  mammographic  CAD  system. 

We  strongly  believe  the  development  of  such  a  three-group  classifier  to  be  of  practical  and 
not  merely  academic  importance.  In  the  past,  two  types  of  mammographic  CAD  schemes 
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have  been  investigated  at  the  University  of  Chicago:  one  for  automatically  detecting  mass 
lesions  in  mammograms  [8-12],  and  one  for  classifying  known  lesions  as  malignant  or  be¬ 
nign  [13-17].  Combining  these  two  types  of  CAD  scheme  is  inherently  difficult,  because 
the  output  of  the  detection  scheme  will  necessarily  include  FP  computer  detections  in  ad¬ 
dition  to  the  malignant  and  benign  lesions  to  be  classified.  These  FP  computer  detections 
correspond  to  objects  which  were  by  design  not  included  in  the  training  sample  of  the  classi¬ 
fication  scheme,  because  they  are  not  members  of  the  data  population  (benign  and  malignant 
mass  breast  lesions)  for  which  the  classification  scheme  was  created.  It  is  clear  then  that 
the  detection  scheme’s  output  cannot  be  used  unmodified  as  the  input  to  the  classification 
scheme. 

Our  approach  has  been  to  treat  this  problem  explicitly  as  a  three-group  classification 
task.  That  is,  the  output  of  the  detection  scheme  should  be  classified  as  malignant  lesions, 
benign  lesions,  and  non-lesions  (FP  computer  detections),  and  the  classifier  to  be  estimated 
is  the  ideal  observer  decision  function  for  this  task.  If  successful,  this  approach  would  allow 
radiologists  to  identify  more  malignant  lesions  without  increasing  biopsy  rates  for  patients 
without  malignancy. 

Our  approved  Statement  of  Work  is  as  follows: 

Task  1.  Develop  a  three-group  classifier  for  clustered  micro  calcifications  in  mammograms,  Months 
1-12. 

(a)  Collect  cases  containing  180  malignant  and  180  benign  clusters  of  microcalcifica¬ 
tions. 

(b)  Determine  truth  state  of  imaged  lesions  by  reviewing  the  images,  radiologist  re¬ 
ports,  and  pathology  reports  for  these  cases. 

(c)  Obtain  at  least  180  FP  computer  detections  from  these  cases  using  the  existing 
detection  scheme. 

(d)  Train  and  test  a  three-group  classifier  on  these  lesions,  using  methodology  we 
previously  developed  for  mass  lesions. 

Task  2.  Design  and  develop  an  interface  for  an  intelligent  workstation  for  CAD,  Months  11-14. 

(a)  Examine  the  most  useful  features  of  the  interface  of  the  existing  intelligent  CAD 
workstation  for  mammographic  lesion  detection. 

(b)  Examine  the  most  useful  features  of  the  interface  of  the  existing  CAD  schemes  in 
our  laboratory  for  classifying  manually  detected  lesions  as  malignant  or  benign. 

(c)  Develop  a  simple  interface  drawing  on  the  advantages  of  the  existing  detection 
and  classification  schemes,  extended  to  the  three-group  classification  task. 

(d)  Test  the  interface  with  non-radiologist  observers  in  our  laboratory  familiar  with 
the  goals  of  CAD  and  with  interface  design  principles. 

Task  3.  Design  and  perform  a  pilot  observer  study  measuring  radiologists’  performances  using 
the  three-group  classification  schemes  and  traditional  two-group  classification  schemes, 
Months  15-24. 

(a)  Recruit  radiologists  from  our  institution  and  neighboring  institutions. 
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(b)  Provide  training  to  the  radiologists  in  the  use  of  the  intelligent  CAD  workstation 
interfaces. 

(c)  Measure  radiologist  performance  using  the  three-group  intelligent  workstation, 
and  using  the  existing  intelligent  workstation  for  detecting  lesions  followed  by 
manual  selection  of  lesions  to  be  analyzed  by  the  existing  schemes  for  two-group 
classification  of  lesions. 

Task  4.  Develop  techniques  to  compare  radiologists’  performance  in  using  the  proposed  three- 
group  and  traditional  two-group  classification  schemes,  Months  18-36. 

(a)  Develop  methodology  to  extend  two-group  ROC  analysis  to  tasks  in  which  obser¬ 
vations  are  classified  into  three  groups. 

(b)  Develop  methodology  to  determine  the  statistical  significance  of  measured  differ¬ 
ences  in  performance  between  three-group  classifiers. 

(c)  Use  this  methodology  to  analyze  the  observer  data  obtained  in  Task  3. 

Our  research  accomplishments  to  date  have  focused  almost  entirely  on  Task  4.  Although  the 
“methodology  we  previously  developed  for  mass  lesions”  [18]  was  successful  for  estimating 
ideal  observer  decision  variables  based  on  lesion  feature  data,  a  practical  classifier  to  make 
use  of  this  decision  variable  data  has  not  yet  been  implemented.  As  the  difficulties  in  theo¬ 
retically  characterizing  the  behavior  of  such  a  three-group  classifier  are  intimately  related  to 
evaluation  of  such  a  classifier’s  performance  (i.  e.,  the  development  of  a  three-group  extension 
to  ROC  analysis),  such  a  reordering  of  the  approved  tasks  seems  logically  justified. 

By  far  the  most  important  result  achieved  so  far  was  our  discovery  and  proof  that  an 
obvious  generalization  of  the  well-known  performance  metric,  the  area  under  the  ROC  curve 
(AUC),  is  not  in  fact  useful  in  tasks  with  three  or  more  groups  [19].  (See  Appendix  A.) 
This  accomplishment  relates  directly  to  Task  4.(b)  above,  which  implicitly  requires  a  well- 
defined  performance  metric  with  respect  to  which  the  statistical  significance  of  differences 
in  performance  may  be  computed.  Although  arguably  a  “negative”  rather  than  “positive” 
result  —  a  well-defined  performance  metric  has  not  yet  been  found  —  this  result  has  been 
very  well  received  in  the  observer  performance  and  CAD  research  communities.  First,  it 
serves  as  a  striking  yet  typical  example  of  how  intuition  can  often  be  an  unreliable  guide  in 
extending  methodology  from  the  two-group  classification  task  to  tasks  with  three  or  more 
groups.  Second,  it  clearly  indicates  that  the  search  for  such  a  well-defined  performance  metric 
will  yield  a  deeper  understanding  of  the  properties  of  three-group  observer  performance, 
particularly  as  characterized  by  ROC  analysis. 

We  stated  above  that  exact  determination  of  the  ideal  observer’s  decision  variables  re¬ 
quires  knowledge  of  the  PDFs  from  which  the  observational  data  to  be  classified  were  drawn. 
The  tool  we  have  been  using  for  some  time  now  to  estimate  ideal  observer  decision  variables 
from  samples  of  observational  data  is  the  Bayesian  artificial  neural  network  (BANN)  [20]. 
In  previous  simulation  studies  in  which  the  PDFs  of  the  observational  data  are  known,  the 
output  of  the  BANN  was  found  to  agree  with  the  calculated  ideal  observer  decision  variables 
for  two-group  [4]  and  three-group  [21]  classification  tasks.  In  practice,  one  does  not  have 
the  PDFs  of  real  observational  data,  but  we  previously  developed  a  means  of  evaluating 
three-group  BANN  decision  variables  by  comparing  them  with  two-group  BANN  decision 
variables  obtained  from  simplified  two-group  tasks  using  the  same  observational  data  [18]. 
During  the  past  year,  we  developed  an  independent  technique  for  evaluating  three-group 
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BANN  estimates  of  ideal  observer  decision  variables,  again  based  on  theoretical  properties 
of  the  three-group  ideal  observer  [22].  (See  Appendix  B.)  This  result  is  important  because 
the  three-group  classifier  we  are  developing  under  the  current  research  will  be  trained  and 
tested  using  feature  data  from  actual  mammograms;  thus,  we  will  not  have  access  to  the 
PDFs  from  which  those  data  are  drawn.  In  addition  to  three-group  ROC  analysis  methods 
to  be  developed  by  extension  from  existing  two-group  methods  [5],  it  will  be  beneficial  to 
have  a  direct  method  of  judging  the  ability  of  the  BANN  decision  variables  to  accurately 
estimate  ideal  observer  decision  variables. 

In  our  efforts  to  develop  a  three-group  classifier  and  appropriate  performance  evaluation 
methodology,  we  have  made  every  attempt  to  keep  our  analysis  as  general  as  possible  de¬ 
spite  the  theoretical  difficulties  this  entails.  Other  researchers  have  proposed  three-group 
methodology  by  considering  observers  whose  behavior  is  restricted  in  particular  ways,  or 
by  considering  only  a  subset  of  the  possible  performance  characterization  indices  (the  axes 
of  ROC  space),  or  both  [23-25].  The  inherent  complexity  of  the  three-group  classification 
task  makes  direct  comparison  of  different  methods  by  different  researchers  difficult.  To  fa¬ 
cilitate  such  a  comparison,  we  evaluated  the  different  methods  in  terms  of  the  three-group 
ideal  observer,  both  in  preliminary  work  [26]  (see  Appendix  C)  and  later  through  more  in- 
depth  analysis  [27]  (see  Appendix  D).  In  addition  to  providing  us  with  valuable  insight 
and  experience  in  comparing  different  classifiers,  which  should  ultimately  prove  directly  rel¬ 
evant  to  Task  4.  above,  this  work  also  enabled  us  to  present  to  the  observer  performance 
and  CAD  research  communities  a  useful  framework  within  which  comparison  of  superficially 
very  different  classifiers  can  readily  be  made. 

Most  recently,  we  have  thoroughly  investigated  the  behavior  of  the  three-group  ideal 
observer.  In  particular,  it  is  well-known  that  the  three-group  ideal  observer  makes  decisions 
by  partitioning  a  plane  of  two  decision  variables  into  three  regions  using  three  decision 
boundary  lines  [3].  We  showed  that  the  locations  and  orientations  of  these  decision  boundary 
lines  are  not  arbitrary;  given  the  slopes  and  y-intercepts,  for  example,  of  two  of  the  lines, 
those  of  the  third  line  are  constrained  to  lie  within  a  particular  range  of  values  [28].  (See 
Appendix  E.)  A  detailed  understanding  of  such  properties  of  the  three-group  ideal  observer 
will  prove  crucial  to  the  calculation  of  observer  ROC  operating  points,  and  by  extension  to 
observer  performance  evaluation  in  general.  Since  the  initiation  of  funding  for  this  project, 
the  principal  investigator  and  mentor  have  been  holding  regular  meetings  to  discuss  the 
theoretical  challenges  posed  by  this  project  and  to  explore  possible  ways  of  overcoming 
those  challenges. 

3  Key  Research  Accomplishments 

•  Proof  that  an  obvious  generalization  of  the  well-known  two-group  performance  metric, 
the  AUC,  is  not  useful  in  classification  tasks  with  three  or  more  groups  (Appendix  A) 

•  Development  of  a  novel  technique  for  evaluating  the  quality  of  BANN  estimates  of  ideal 
observer  decision  variables  in  the  absence  of  three-group  ROC  analysis  methodology 
and  observational  data  PDFs  (Appendix  B) 

•  Analysis  of  several  proposed  three-group  classification  methods  in  the  literature  in 
terms  of  the  three-group  ideal  observer  (Appendices  C,  D) 
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•  Detailed  investigation  of  the  relationships  among  the  decision  boundary  lines  used  by 
the  three-group  ideal  observer  (Appendix  E) 

4  Reportable  Outcomes 

•  D.  C.  Edwards,  C.  E.  Metz,  and  R.  M.  Nishikawa,  “The  hypervolume  under  the  ROC 
hypersurface  of  ‘near-guessing’  and  ‘near-perfect’  observers  in  AT-class  classification 
tasks,”  IEEE  Trans.  Med.  Imag.,  vol.  24,  pp.  293-299,  2005. 

•  D.  C.  Edwards  and  C.  E.  Metz,  “Evaluating  Bayesian  ANN  estimates  of  ideal  observer 
decision  variables  by  comparison  with  identity  functions,”  in  Proc.  SPIE  Vol.  5749 
Medical  Imaging  2005:  Image  Perception,  Observer  Performance,  and  Technology  As¬ 
sessment,  Miguel  P.  Eckstein  and  Yulei  Jiang,  Eds.,  SPIE,  Bellingham,  WA,  2005,  pp. 
174-182.  [Conference  presentation  and  proceedings  paper.] 

•  D.  C.  Edwards  and  C.  E.  Metz,  “Review  of  several  proposed  three-class  classification 
decision  rules  and  their  relation  to  the  ideal  observer  decision  rule,”  in  Proc.  SPIE  Vol. 
5749  Medical  Imaging  2005:  Image  Perception,  Observer  Performance,  and  Technology 
Assessment,  Miguel  P.  Eckstein  and  Yulei  Jiang,  Eds.,  SPIE,  Bellingham,  WA,  2005, 
pp.  128-137.  [Conference  presentation  and  proceedings  paper.] 

•  D.  C.  Edwards  and  C.  E.  Metz,  “Analysis  of  proposed  three-class  classification  decision 
rules  in  terms  of  the  ideal  observer  decision  rule,”  J.  Math.  Psychol,  2005,  (submitted). 

•  D.  C.  Edwards  and  C.  E.  Metz,  “Restrictions  on  the  three-class  ideal  observer’s  decision 
boundary  lines,”  IEEE  Trans.  Med.  Imag.,  2005,  (submitted). 

5  Conclusions 

During  the  past  year  we  have  focused  our  efforts  on  theoretical  understanding  of  the  behavior 
and  properties  of  the  three-group  classifier.  We  have  proven  that  an  obvious  generalization  of 
the  well-known  two-group  performance  metric,  the  AUC,  is  not  in  fact  a  useful  performance 
metric  for  classification  tasks  with  three  or  more  groups.  We  have  developed  an  evaluation 
technique,  independent  of  those  we  have  previously  developed,  for  assessing  the  ability  of 
BANN  decision  variables  to  accurately  estimate  ideal  observer  decision  variables.  We  have 
analyzed  several  recently  proposed  three-group  classification  methods  in  terms  of  the  three- 
group  ideal  observer.  Finally,  we  have  shown  that  the  three  decision  boundary  lines  used  by 
the  three-group  ideal  observer  are  not  arbitrary,  but  are  intricately  related  to  one  another. 

Although  these  results  are  theoretical,  they  are  crucial  steps  in  the  development  of  a  prac¬ 
tical  three-group  classifier  and  a  fully  general  three-group  performance  evaluation  method¬ 
ology.  Despite  the  considerable  difficulties  involved  in  such  development,  a  CAD  scheme 
incorporating  a  three-group  classifier  as  we  propose  could  potentially  allow  radiologists  to 
detect  more  malignant  breast  lesions  without  increasing  their  FP  biopsy  rate.  We  believe 
this  goal  to  be  worth  the  necessary  effort  on  our  part. 
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The  Hypervolume  Under  the  ROC  Hypersurface  of 
“Near-Guessing”  and  “Near-Perfect”  Observers  in 
N -Class  Classification  Tasks 

Darrin  C.  Edwards*,  Charles  E.  Metz,  and  Robert  M.  Nishikawa 


Abstract — We  express  the  performance  of  the  TV -class 
“guessing”  observer  in  terms  of  the  TV 2  —  TV  conditional 
probabilities  which  make  up  an  iV-class  receiver  operating  char¬ 
acteristic  (ROC)  space,  in  a  formulation  in  which  sensitivities  are 
eliminated  in  constructing  the  ROC  space  (equivalent  to  using 
false-negative  fraction  and  false-positive  fraction  in  a  two-class 
task).  We  then  show  that  the  “guessing”  observer’s  performance 
in  terms  of  these  conditional  probabilities  is  completely  described 
by  a  degenerate  hypersurface  with  only  N  —  1  degrees  of  freedom 
(as  opposed  to  the  TV2  —  TV  —  1  required,  in  general,  to  achieve  a 
true  hypersurface  in  such  a  ROC  space).  It  readily  follows  that  the 
hypervolume  under  such  a  degenerate  hypersurface  must  be  zero 
when  TV  >  2.  We  then  consider  a  “near-guessing”  task;  that  is,  a 
task  in  which  the  TV  underlying  data  probability  density  functions 
(pdfs)  are  nearly  identical,  controlled  by  TV  —  1  parameters  which 
may  vary  continuously  to  zero  (at  which  point  the  pdfs  become 
identical).  With  this  approach,  we  show  that  the  hypervolume 
under  the  ROC  hypersurface  of  an  observer  in  an  TV-class  classifi¬ 
cation  task  tends  continuously  to  zero  as  the  underlying  data  pdfs 
converge  continuously  to  identity  (a  “guessing”  task).  The  hyper¬ 
volume  under  the  ROC  hypersurface  of  a  “perfect”  ideal  observer 
(in  a  task  in  which  the  TV  data  pdfs  never  overlap)  is  also  found 
to  be  zero  in  the  ROC  space  formulation  under  consideration. 
This  suggests  that  hypervolume  may  not  be  a  useful  performance 
metric  in  TV-class  classification  tasks  for  TV  >  2,  despite  the 
utility  of  the  area  under  the  ROC  curve  for  two-class  tasks. 

Index  Terms — TV-class  classification,  ROC  analysis,  ROC  per¬ 
formance  metrics. 


I.  Introduction 

WE  are  attempting  to  develop  a  fully  automated  mass 
lesion  classification  scheme  for  computer-aided  diag¬ 
nosis  (CAD)  in  mammography.  This  scheme  will  combine 
two  schemes  developed  at  the  University  of  Chicago:  one  for 
automatically  detecting  mass  lesions  in  mammograms  [1]— [5], 
and  one  for  classifying  known  lesions  as  malignant  or  benign 
[6]— [10].  Combining  these  two  types  of  CAD  scheme  is  inher¬ 
ently  difficult,  because  the  output  of  the  detection  scheme  will 
necessarily  include  false-positive  (FP)  computer  detections  in 
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addition  to  the  malignant  and  benign  lesions  to  be  classified. 
These  FP  computer  detections  correspond  to  objects  which 
were  by  design  not  included  in  the  training  sample  of  the 
classification  scheme,  because  they  are  not  members  of  the 
data  population  (benign  and  malignant  mass  breast  lesions)  for 
which  the  classification  scheme  was  created.  It  is  clear  then 
that  the  detection  scheme’s  output  cannot  be  used  unmodified 
as  the  input  to  the  classification  scheme. 

Our  approach  has  been  to  treat  this  problem  explicitly  as  a 
three-class  classification  task.  That  is,  the  outputs  of  the  detec¬ 
tion  scheme  should  be  classified  as  malignant  lesions,  benign 
lesions,  and  nonlesions  (FP  computer  detections),  and  the  clas¬ 
sifier  to  be  estimated  is  the  ideal  observer  decision  function  for 
this  task.  Such  an  approach  presents  considerable  difficulties  of 
its  own.  On  the  one  hand,  decision  functions,  in  particular  ideal 
observer  decision  functions,  increase  rapidly  in  complexity  with 
the  number  of  classes  involved.  On  the  other  hand,  fully  general 
performance  evaluation  methods,  in  particular  a  fully  general 
three-class  extension  of  receiver  operating  characteristic  (ROC) 
analysis,  have  yet  to  be  developed  for  such  a  task. 

Although  we  have  had  preliminary  success  in  using  Bayesian 
artificial  neural  networks  (BANNs)  [1 1],  [12]  to  estimate  three- 
class  ideal-observer-related  decision  variables  [13],  [14],  the 
task  of  developing  an  extension  of  ROC  analysis  to  classifica¬ 
tion  tasks  with  three  or  more  classes  has  proved  somewhat  more 
daunting.  Our  initial  efforts  in  this  direction  have,  thus,  been 
more  theoretical  than  practical  so  far  [15].  One  issue  we  began 
to  investigate  recently  was  the  calculation  of  an  obvious  gen¬ 
eralization  of  the  well-known  area  under  the  ROC  curve  (AUC) 
performance  metric,  a  quantity  we  are  calling  the  “hypervolume 
under  the  ROC  hypersurface.”  Detailed  consideration  of  the  in¬ 
tegrals  involved  in  calculating  this  quantity  led  us  to  the  coun¬ 
terintuitive  conclusion  that,  despite  the  great  success  and  utility 
of  the  AUC  performance  metric  in  two-class  classification  tasks, 
the  hypervolume  under  the  ROC  hypersurface  does  not  appear 
to  be  a  useful  performance  metric  in  TV-class  classification  tasks 
for  TV  >  2.  The  proof  of  this  claim  is  arrived  at  by  considering 
observer  performance  in  two  extremes:  the  “guessing”  observer 
and  the  “perfect”  observer.  It  should  be  explicitly  noted  that  in 
our  formulation,  sensitivities  are  eliminated  in  constructing  the 
ROC  space;  this  is  equivalent  to  using  false-negative  fraction 
(FNF)  and  false-positive  fraction  (FPF)  in  a  two-class  task.  In 
such  a  formulation,  the  “guessing”  observer  in  a  two-class  task 
achieves  an  AUC  of  0.5  as  expected,  but  the  “perfect”  observer 
in  a  two-class  task  achieves  an  AUC  of  zero. 

In  Section  II,  we  consider  the  properties  of  the  “guessing” 
observer  in  an  TV-class  classification  task,  and  of  its  ROC 
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*  hypersurface.  In  Section  III,  we  consider  the  properties  of  the 
ROC  hypersurface  of  a  so-called  “near-guessing”  observer, 
i.e.,  an  observer  in  a  task  for  which  the  observational  data 
probability  density  functions  (pdfs)  are  not  identical,  but  differ 
only  by  arbitrarily  small  amounts.  In  Section  IV,  we  then  show 
that  the  hypervolume  under  the  ROC  hypersurface  of  such 
a  “near-guessing”  observer  will  continuously  approach  the 
hypervolume  under  the  ROC  hypersurface  of  the  “guessing” 
observer  as  the  observational  data  pdfs  continuously  approach 
identity;  furthermore,  the  hypervolume  under  the  ROC  hyper¬ 
surface  of  the  “guessing”  observer  is  shown  to  be  zero. 

We  then  show  in  Section  V  that  the  hypervolume  under  the 
ROC  hypersurface  of  the  “perfect”  observer  is  zero  (as  expected 
by  analogy  with  the  two-class  task),  and  that  the  hypervolume 
under  the  ROC  hypersurface  of  a  “near-perfect”  observer  will 
approach  zero  continuously  as  the  observational  data  pdfs  are 
separated.  Finally,  in  Section  VI,  we  argue  that  these  results 
taken  together  imply  that  the  hypervolume  under  the  ROC  hy¬ 
persurface  is  not  a  useful  performance  metric  in  TV-class  classi¬ 
fication  tasks  for  TV  >  2,  despite  the  utility  of  the  AUC  perfor¬ 
mance  metric  in  two-class  tasks. 


II.  The  ROC  Hypersurface  of  the  TV-Class  “Guessing” 
Observer 

The  performance  of  an  observer  in  an  TV-class  classification 
task  is  completely  determined  by  a  hypersurface  with  TV2  -  TV  - 
1  degrees  of  freedom  in  an  (TV2  -  TV)-dimensional  ROC  space 
[16],  Without  loss  of  generality,  we  can  specify  any  point  in 
the  ROC  space  by  a  vector  of  the  misclassification  probabili¬ 
ties  [P( d  =  7Ti  1 1  =  7T2),  .  .  .  ,  P(d  =  7Ti  1 1  =  TTjv),  P( d  = 
7T2|t  =  m),P(d  =  7T2  1 1  =  7T3),...,P(d  =  7T2  1 1  = 
nN),P(d  =  70v|t  =  vr/v—i),  P{d  =  7T/v|t  =  zri)]1^  [15]. 
Here  the  TV  classes  are  denoted  by  the  labels  7Ti, ...,  7rjv;  d  de¬ 
notes  the  class  to  which  an  observation  is  assigned  (the  “de¬ 
cision”);  and  t  is  the  class  to  which  it  actually  belongs  (the 
“truth”).  We  use  boldface  type  to  denote  statistically  variable 
quantities.  For  simplicity,  we  write  P(d  =  7r,  1 1  =  7Tj)  as  P%j. 

We  can  also,  again  without  loss  of  generality,  consider  the 
ROC  hypersurface  to  be  given  by  Pni  considered  as  a  function 
of  the  other  TV2  -  TV  —  1  misclassification  probabilities  [15], 
Note  that  this  formulation  is  equivalent,  in  a  two-class  classi¬ 
fication  task,  to  using  FPF  and  FNF  to  characterize  the  ROC 
curve,  rather  than  FPF  and  true-positive  fraction  (TPF),  as  is 
more  common.  In  a  two-class  classification  task,  this  produces 
ROC  curves  which  are  “upside-down”  with  respect  to  the  stan¬ 
dard  formulation;  we  have  adopted  the  nonstandard  formulation 
described  above  because  it  has  proven  easier  to  generalize  to 
classification  tasks  with  more  than  two  classes. 

Some  researchers  have  suggested  [17],  [18]  that  in,  e.g.,  a 
three-class  classification  task,  the  set  of  three  “sensitivities” 
(P(d  =  m  j  t  —  7Ti)  in  our  notation)  provides  a  complete  de¬ 
scription  of  observer  performance.  This  is  incorrect  in  general, 
because  it  ignores  the  TV2  -  TV  misclassification  probabilities, 
not  all  of  which  are  determined  uniquely  by  the  “sensitivi¬ 
ties”  when  TV  >  2  unless  particular  restrictions  are  imposed 
on  the  observer’s  behavior.  Complete  quantification  of  the 
trade-offs  available  among  the  probabilities  of  various  kinds 


of  misclassification  error  is  important  in  medical  diagnosis, 
where  different  misclassification  errors  often  have  substan¬ 
tially  different  clinical  consequences.  Moreover,  restrictions 
concerning  the  observer’s  behavior  are  inappropriate  when 
considering  the  general  behavior  of  ideal  observers,  human 
observers,  or  automated  observers  (such  as  automated  schemes 
for  computer-aided  diagnosis)  designed  to  approximate  ideal 
or  human  observer  behavior.  Other  researchers  have  reduced 
the  three-class  ROC  hypersurface  to  more  tractable  two-dimen¬ 
sional  surfaces  in  three-dimensional  ROC  spaces  by  explicitly 
imposing  restrictions  on  the  form  of  the  observer’s  decision 
rule  [19],  [20],  or  on  the  utilities  used  by  an  ideal  observer 
[21].  While  such  restrictions  may  ultimately  prove  to  be  of 
great  pragmatic  importance  given  the  inherent  complexity  of 
multi-class  classification  tasks,  our  approach  so  far  has  been 
to  attempt  as  general  an  understanding  as  possible  of  the 
unrestricted  classification  task. 

Consider  the  performance  of  an  observer  which  makes  de¬ 
cisions  by  “guessing,”  that  is,  in  a  random  fashion  unrelated 
to  the  actual  class  t  from  which  a  given  observation  is  drawn. 
(Note  that  this  corresponds  to  the  performance  of  the  ideal  ob¬ 
server  when  the  pdfs  of  the  observational  data  are  identical,  i.e., 
p(x 1 7r x )  =  p(x|7r2)  =  •••  =  p(x|7Tjv).)  In  this  case,  we 
clearly  must  have 


J°i2  =  P13  =  ■  ■  ■  =  Pin  (1) 

P21  =P23  =  ■•■  =  P2N  (2) 

Pni  =  Pn2  =  •••  =  Pn(n-i)-  (3) 

Defining  a*  =  PiN  for  1  <  *  <  TV  -  1,  and  on  =  Pn(n-  1). 
we  see  that  the  performance  of  the  “guessing”  observer  is  given 
by  a  locus  of  vectors  of  the  form 


Oil 

Ql 


TV  —  1  elements 


OLi 

OLi 


TV  —  1  elements 


(4) 


aN  \ 

aN  >  TV  —  1  elements 


where  all  of  the  a*  are  restricted  to  the  range  [0,1].  Furthermore, 
note  that 


N 

P(d  =  TTi)  =  '^PijP(t  =  TTj) 

3— 1 
N 

=  S“fP(t  = 

i=i 

=  OLi 


(5) 
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which  immediately  gives  a jv  =  1  —  ai-  Thus,  the  per¬ 
formance  of  the  “guessing”  observer  is  given  by 

f  P12 

P 13 


a  i 


Pin 


TV  -  1  elements 


based  on  the  value  of  a  second  decision  parameter  [17],  thus 
depending  on  fewer  than  the  five  degrees  of  freedom  needed  in 
a  three-class  classification  task.)  Such  “degenerate”  observers 
will  not  be  considered  here  (apart  from  the  “guessing”  observer 
itself). 

We  can,  thus,  define  TV  regions  which  partition  the  original 
data  space,  given  particular  values  of  the  parameters  7,  by 


Pn 

Pij  {*  7 ^  7  } 


Oil 


Oil 


TV  -  1  elements 


£>1(7)  =  {x  : 

:  d  =  tti  given  7} 

(10) 

P>i  (7)  =  {£  : 

;  d  =  7T,  given  7} 

01) 

PiN 


Pn(n-i) 


1  — 1 

1  —  /Jj= 1  aj 
1  A’ — 1 
1  1  aj 


TV  —  1  elements 


Pni 


N- 1 

=  ■up  +  a<Ui.  (6) 

i=l 

This  is  the  parametric  equation  for  an  (TV- l)-dimensional  plane 
in  an  ( TV 2  -  TV)-dimensional  space;  the  actual  performance  of 
the  “guessing”  observer  will  of  course  be  further  restricted  to  a 
region  within  this  plane  such  that  0<ai<l,0<l  —  ]Ta;< 


1. 


III.  The  ROC  Hypersurface  of  an  TV-Class 
“Near-Guessing”  Observer 

Consider  observational  data  x  drawn  from  TV  pdfs 

p(x  |t  =  7Ti)  =  p(S |t  =  ttn)  +  Sihi(x)  (7) 

p(x  1 1  =  7Tj )  =  p(x  1 1  -  7T tv )  +  6jhj(x)  (8) 

p(x  |t  =  7Tjv)  (9) 


T>n( 7)  =  {£  ;  d  =  2TJV  given  7}.  (12) 

For  a  nonrandom  observer,  the  T>i  can  be  expected  to  depend 
implicitly  on  the  pdfs  (7)-(9)  and,  therefore,  on  the  Sj.  The  mis- 
classification  probabilities  which  define  the  ROC  hypersurface 
are  then  given  by 


-P12  I 

P 13 

IVlP(S  \t  =  K2)dnx 

JVl  p(z|t  =  7T3)cfl£ 

-Pin 

JVi  p(x  |t  =  7 xN)dnx 

Pn 

fVi  p(x  |t  =  TT1)dnX 

Pij  {*  7*  7  } 

= 

Jv.p(x\t  =  nj)dnS  {i^j} 

Pin 

Jv,  p(x  |t  =  irN)dnx 

Pn(n-i) 

JvNP^\t  =  /KN-i)dnx 

Pni 

fvNP(*  \t  =  *i)dnx 

(13) 


where  0  <  Sj  <  l,f  hj(x)dnx  =  0,  and  \hj(x)\  <  p(x  |t  = 
7r/v )  for  1  <  j  <  TV  -  1.  In  the  limit  as  the  Sj  all  approach 
zero,  we  expect  the  performance  of  any  observer  for  this  task  to 
converge  smoothly  to  that  of  the  “guessing”  observer. 

Decisions  are  made  by  partitioning  the  decision  variable 
space  into  TV  regions,  determined  by  a  total  of  TV2  —  TV  -  1 
parameters;  we  denote  these  parameters  by  the  components  of 
a  vector  7.  An  observer  which  uses  more  than  TV2  -  TV  —  1 
parameters  for  an  TV-class  classification  task  can  always  be 
replaced  by  a  simplified  observer,  such  that  the  “excess”  param¬ 
eters  are  eliminated  by  the  requirement  that  Pn  1  be  minimized, 
thereby  collapsing  the  dimensionality  of  the  parameter  space  to 
TV2  —  TV  -  1.  On  the  other  hand,  an  observer  which  uses  fewer 
than  TV2  -  TV  -  1  decision  parameters  will  fail  to  generate  a 
true  ROC  hypersurface — i.e.,  one  with  TV2  —  TV  -  1  degrees 
of  freedom  in  the  (TV2  -  TV) -dimensional  ROC  space.  (An  ex¬ 
ample  in  a  three-class  classification  task  would  be  an  observer 
which  sequentially  performs  a  pair  of  binary  classification 
tasks  by  first  classifying  observations  as  being  “7Ti”  or  “not 
TTi”  based  on  the  value  of  a  single  decision  parameter,  and  then 
further  classifying  the  “not  7Ti”  observations  as  “712”  or  “713” 


Using  (7)  and  (8),  we  can  rewrite  this  as 


Pi 2  I 

Pi  3 

Pin  +  62  fVl  h2{x)  dnx 

Pin  +  S3  JVi  h3{x)  dnx 

Pin 

Pin 

Pi  1 

PiN  +  fv.  hi(x)  dnx 

Pij  {i  ^  i} 

= 

PiN  +  Sj  fv.  hj (x)  dnx  {i^jj 

.  PiN 

PiN 

Pn(n-i) 

Pnn  +  Sn-  1  $Vn  hN-i(x)  dnx 

Pni 

.  Pnn  +  Si  JVn  hi (x)  dnx  . 

(14) 
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Defining  the  functions  =  Jv  hj{x)  dnx  allows  us  to  sim¬ 
plify  the  notation  slightly 

T  P12  I  I"  Pin  +  62H12 
P13  Pin  +  63H13 


PiN  +  f>iHn 


Pij  {i  #  i)  ~  PiN  +  SjHii  {i?j}  ■  (15) 


where  the  vectors  Wj  have  components  which  depend  only  on 
Hij .  The  first  term  on  the  right-hand  side  of  this  equation  is  just 
the  expression  for  the  “guessing”  observer  [cf.  the  left-hand  side 
of  (6)].  The  other  term  on  the  righthand  side  of  this  equation 
tends  to  zero  as  the  Sj  tend  to  zero.  Note  that  the  may  in 
general  depend  on  the  8  k  via  (10)— (12),  but 

\Hij\  =  I  f  hj(x)dnx 

I  Jv, 

<  f  \hj(x)\dnx 

Jv  i 

<  I  p(x  1 1  =  7 rtf)  dnx 

JVi 

=  PiN 


P(N- 1 


Pn  n  +  Sn-iHn(n-i) 


L  Pn  i  J  L  Pnn  +  (>iHni  J 

Now  of  course  Pn  n  =  1  —  PiN',  for  simplicity,  we  will 

write  cti  =  Pin-  Equation  (15)  can  now  be  written  as 

P12  '  "  ai  +  62P12  1 

Pl3  al  +  83H13 


cti  +  SiHn 


Pij  {i*J}  ~  ai  +  8j  Hij  {i?j} 


P(N- 1)  1  _  Syli1  ai  +  6n-iHn(n-i) 

Pni  .  .  1  —  S/Li1  aJ  +  &iHni 

which  further  simplifies  to 

[  2  1  . 


N  —  1  elements 


Pij  {i*j} 


<*i  >  N  —  1  elements 


j 

^  —  Sj=i  aj  >  N  —  1  elements 


Thus,  the  are  bounded,  and  will  possess  Taylor  expansions 
in  8k  (i.e.,  will  not  depend  on  terms  of  the  form  8^m  for  posi¬ 
tive  integers  m).  Therefore,  operating  points  on  the  ROC  hyper¬ 
surface  of  a  “near-guessing”  observer  tend  continuously  toward 
points  on  the  ROC  hypersurface  of  the  “guessing”  observer. 
Note  that  the  N(N  - 1)  terms  a;,  SjHij  are  not  all  independent, 
since  they  all  depend  implicitly  for  fixed  8j  on  the  TV2  —  TV  —  1 
decision  parameters  7.  That  is,  the  ROC  hypersurface  given  by 
(17)  possesses  only  N2  —  N  —  l  degrees  of  freedom. 

IV.  The  Hypervolume  Under  the  ROC  Hypersurface  of 
an  TV-Class  “Near-Guessing”  Observer 

In  the  preceding  section,  it  was  shown  that  the  ROC  hyper¬ 
surface  of  a  “near-guessing”  observer  tends  continuously  to 
the  ROC  hypersurface  of  a  “guessing”  observer  as  the  pdfs  of 
the  observational  data  tend  arbitrarily  toward  identical  distri¬ 
butions.  Intuitively,  one  would  expect  that  the  hypervolumes 
under  these  hypersurfaces  should  also  tend  toward  each  other. 
Since  intuition  can  occasionally  be  an  unreliable  guide  in 
analyzing  TV-class  classification  tasks,  it  would  be  reassuring  if 
the  results  of  the  preceding  section  could  be  applied  directly  to 
the  calculation  of  the  relevant  hypervolumes. 

For  this  section,  we  will  write  P^  as  Pij{ 7),  emphasizing 
that  it  is  a  function  of  the  decision  parameters  chosen.  We,  thus, 
rewrite  (15)  to  obtain 

r  P  (: «\  3  r  AnO?)  +  6 2^12(7?)  1 

f  1  Pin(?)  +  63*13(7)  I 


■Pin  (7) 


Pn  (7) 


Pij  (7)  {*#)} 


PiN  (7) 


Pn,(N-i)(7) 


+  Y  (17>  L  Pn, 1(7) 


Pin  (7) 

PiN  (7)  +  6iHn(y) 

-  PiN{i)  +  8jHij(j)  {i^j}  .(19) 

PiN  (7) 

Pn,(n- i)(7) 

Pn,(n- i)(t)  -  &n-iHn(n-i)(i) 

+61  Uni  (7) 
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To  And  the  hypervolume  under  the  ROC  sur¬ 
face  given  by  Pjvi  considered  as  a  function  of 
(Pi2,  Pi3>  PN(N-i),  •  •  • ,  Pn2),  one  must 

evaluate  the  integral 

j  ■■■  j  PNldN2-N-lP.  (20) 

(The  domain  of  the  integral  is  simply  the  set  of  all  Pij  such  that 
PN i  is  defined.)  Note  that,  for  the  “guessing”  observer,  we  ex¬ 
pect  this  integral  to  be  zero  when  N  >  2  due  to  dimension¬ 
ality  considerations — the  ROC  hypersurface  has  only  N  -  1 
degrees  of  freedom  (cf.  (6)),  not  the  N2  -  N  -  1  required  in 
this  (AT2  —  iV) -dimensional  ROC  space.  To  see  this  explicitly, 
one  can  rearrange  the  order  of  integration  and  consider  the  in¬ 
nermost  integral  f  PNidPjv(N-i)  f°r  fixed  values  of  the  other 
misclassification  probabilities.  Then  the  limits  of  integration  of 
this  innermost  definite  integral  become,  again  by  (6) 

/  Pni<1Pn(n-i)  {j  <  N)  (21) 

which  is  zero  by  inspection. 

We  now  return  to  the  general  case  of  a  “near-guessing”  ob¬ 
server.  One  way  to  evaluate  the  integral  in  (20)  is  to  reexpress 
it  explicitly  in  terms  of  the  decision  parameters  7,  via  the  Jaco¬ 
bian 


BP  12 

8P.2 

872 

8  Pi? 

SPl2 

871 

873 

8f'uv 

8  Pi  N 

872 

8P.AT 

8pin 

871 

873 

Bypr^-N-i 

8pii 

BPg 

8  Pi 

873 

BPi 

8l\ 

872 

87N2_N-l 

8pN(N-il 

8PN(N- 1) 

8PN(N- 1) 

...  8PN(N-1) 

871 

872 

873 

37JV2_N_1 

8pN2 

8PjV2 

8PN  2 

8PN2 

071 

872 

873 

87N2_N_! 

‘(22) 


where  the  vertical  bars  indicate  that  the  determinant  of  the  en¬ 
closed  matrix  is  to  be  taken,  and  where  7 j  denotes  the  ith  com¬ 
ponent  of  7.  (We  assume  that  indices  of  the  parameters  7  have 
been  chosen  appropriately  so  that  no  negative  sign  is  introduced, 
i.e.,  volumes  remain  positive.)  For  the  “guessing”  observer,  this 
reduces  to 


'guessing 


8Pin 

8  Pi  N 

8  Pi  N 

8P1N 

871 

872 

373 

87(V2_N_1 

8P1N 

8  Pi  N 

8  Pi  N 

8Pin 

871 

3  72 

873 

A/2 -N -l 

3  PiN 

8PiN 

8PiN 

9PiN 

3  71 

872 

873 

^N2—N  —  l 

8PN(N- 1) 

8PN(N-1) 

8P  N(N-l) 

QPn(n-i) 

871 

872 

873 

8 yN2_N_1 

8PN(N- 1) 

8PN(N-1) 

®P  N(N-l) 

SPN(N-1) 

871 

872 

873 

8y  — N — 1 

(23) 


where  Pjv(jv-i)  =  Pnn  =  1  -  PjN-  For  a  “near¬ 

guessing”  observer,  we  combine  (19)  and  (22)  to  obtain 

djPiN+S^Hiz) 

Oy k 


dPj  w 
dyk 


*/near  — 


S(PiN+SjHj  j) 

dyk 


OP, 


N(N- 


dyk 


d(PN  (N-l)~8N-lHN(N  Hn2) 

dy  k 

(24) 

From  the  properties  of  determinants  [22],  it  can  be  shown  that, 
to  first  order  in  the  6j, 

N- 1 

■/near  —  ■/guessing  "i“  ^  '  SjJj  -f-  *  •  •  (25) 

j= 1 

where  the  Jj  are  bounded  and  continuous  with  respect  to  the  6j. 

If  we  denote  the  hypervolume  under  the  ROC  hypersurface 
of  the  “guessing”  observer  by 


1  guessing 


j  ■■■  j 


=  [■■■[  Pv,v-„(7>J,„„in,  (26) 

then  the  hypervolume  under  the  ROC  hypersurface  of  a  “near¬ 
guessing”  observer  becomes,  again  to  first  order  in  the  8j 

P lear  =  J  J  [PN,(N-l){l)  ~  ^7V  — 1  P^JV(7V— 1)  (t) 

+  <$i//jvi(7)] 

N-l  1 

/guessing  +  SjJj  +  '••  dN  N  1'f  (27) 
j=l 

N-l 

=  /guessing  "t*  ^  8jlj  +  •  •  (28) 

7  =  1 

where  the  integrals  Ij  are  bounded  (i.e.,  they  may  depend  on 
higher  integral  powers  of  Sj ,  but  not  on  Sjm  for  positive  integers 
m).  That  is,  in  the  limit  as  the  6j  tend  toward  zero,  /near  tends 
toward  /guessing  in  a  continuous  fashion. 


V.  The  Hypervolume  Under  the  ROC  Hypersurface  of 
an  Af -Class  “Near-Perfect”  Observer 

In  the  preceding  sections,  we  established  that  the  hyper¬ 
volume  under  the  ROC  hypersurface  of  a  “guessing”  observer 
is  zero,  and  furthermore  that  this  result  is  not  singular:  an 
observer  in  a  “near-guessing”  task  will  achieve  a  ROC  hy¬ 
persurface  with  hypervolume  approaching  zero  continuously 
as  the  data  pdfs  approach  identity.  An  ideal  observer  in  a 
“perfect”  task — i.e.,  in  which  the  data  pdfs  never  overlap — will 
also  achieve  a  ROC  hypersurface  with  zero  hypervolume, 
because  it  can  achieve  the  operating  point  0  and,  thus,  will 
not,  for  any  rational  decision  rule,  achieve  points  interior  to 
the  unit  hypercube  defining  ROC  space.  It  is  reasonable  to  ask 
whether  “near-perfect”  observers,  performing  tasks  for  which 
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the  overlap  in  the  underlying  data  pdfs  is  nearly  negligible, 
behave  similarly  to  “near-guessing”  observers,  in  the  sense 
that  the  hypervolume  under  the  ROC  hypersurface  of  such  an 
observer  will  approach  zero  in  a  continuous  fashion. 

Consider  observational  data  x  drawn  from  N  pdfs  p(x  1 1  = 
7Tj)  where  1  <  j  <  N.  We  denote  the  mean  of  p(x  1 1  =  i tj) 
by  fij  and  note  that,  without  loss  of  generality,  the  mean  of 
p(x  1 1  =  it n)  can  be  taken  to  be  0.  Furthermore,  note  that  we 
can  apply  a  linear  transformation  to  the  data  x  and,  thus,  effec¬ 
tively  to  the  pj,  such  that  each  of  the  resulting  pj  is  either  1) 
mutually  orthogonal  to,  or  2)  a  scalar  multiple  of,  any  of  the 
other  pi.  Because  the  transformation  applied  is  linear,  the  ideal 
observer  for  this  task  will  remain  the  same,  and  hence  the  task 
itself  can  be  considered  essentially  unchanged. 

Let  us  consider  now  an  observer  for  this  task  which  is  gener¬ 
ally  not  ideal;  in  fact,  we  will  consider  only  a  single  operating 
point  achieved  by  this  observer.  The  observer  decides  d  =  7Tj 
for  a  given  observation  x  if 

{i  :  1  <  i  <  N,j  t  i}  (29) 

with  equality  for  any  such  relation  between  two  classes  being 
decided  in  an  arbitrary  but  consistent  manner.  That  is,  the  ob¬ 
server  places  hyperplanes  between  the  means  of  any  two  classes 
when  attempting  to  decide  between  those  classes  (rather  than 
placing  those  hyperplanes  in  the  likelihood  ratio  decision  vari¬ 
able  space,  as  would  the  ideal  observer). 

Now  suppose  the  task  is  made  slightly  “easier,”  while  the  ob¬ 
server  itself  remains  unchanged.  That  is,  consider  the  mean  of 
one  pdf,  say  pi  for  i  ^  N,  being  increased  by  a  factor  1  +  6  for 
0  <  S  <  1,  while  the  location  of  the  decision  hyperplanes  does 
not  change,  except  in  the  special  case  where  pj  =  api  for  some 
other  pdf  (again  with  j  ^  N).  In  this  latter  case  we  increase  both 
means  (/£'•  —  (1  +  6)pj,  p[  =  (1  +  6)pi),  and  the  location  of 
the  corresponding  decision  hyperplane  shifts  accordingly. 

Note  that  p[  is  now  further  away  from  each  decision  hyper¬ 
plane  relevant  to  d  =  7Ti  in  (29).  In  the  case  pj  =  api,  the 
decision  hyperplane  is  now  a  distance  of  | (/*'•)  -  ($)/( 2)|  = 
( 1+ 6)  |  (pj )  -  (pi )  / (2)  |  from  p[ .  For  noncollinear  pj ,  the  direc¬ 
tion  from  p{  to  the  decision  hyperplane  is  given  by  pj  —  pi,  and 
since  pj  and  p't  are  orthogonal,  (p{  -pi)  -(pj  -  pi)  =  -6\pi\2', 
since  this  quantity  is  negative,  it  follows  that  p!{  is  further  from 
that  decision  plane  than  pi. 

It  immediately  follows  from  this  that  none  of  the  misclassifi- 
cation  probabilities  making  up  the  coordinates  of  the  observer’s 
operating  point  can  increase  when  moving  from  the  old  task  to 
the  new  one.  To  see  this,  consider  a  change  of  coordinates  in 
the  data  space  such  that  $  is  now  the  origin.  All  of  the  deci¬ 
sion  hyperplanes  separating  this  class  from  the  others  are  effec¬ 
tively  moving  away  from  the  center  of  its  pdf;  since  the  hyper¬ 
planes  are  translating  without  rotating,  we  see  immediately  that 
the  probability  Pa  cannot  decrease  (and  will  increase  in  gen¬ 
eral),  while  the  other  probabilities  Pji  ( j  i)  cannot  increase 
(and  will  decrease  in  general). 

Note  that  any  pdf  p(x )  must  decrease  more  rapidly  than  \x\~n 
for  sufficiently  large  |a?[,  where  n  is  the  dimensionality  of  x.  This 
allows  us  to  state  qualitatively  the  sense  in  which  the  observer 
under  consideration  is  “near-perfect”:  we  hypothesize  that  the 


Fig.  I.  Operating  point  of  an  observer  in  a  two-class  classification  task  with 
coordinates  (FPFo,  FNFo),  denoted  by  the  point  at  the  lower  left  comer  of 
the  crosshatched  region.  Since  no  rational  observer  will  achieve  points  in  the 
crosshatched  region,  the  area  under  this  observer’s  ROC  curve  cannot  be  greater 
than  1  -  (1  -  FPF0)(1  -  FNF0). 


|/Ij|  are  all  sufficiently  large  that  this  limiting  condition  is  met. 
Given  this  condition,  the  only  situation  in  which  an  error  prob¬ 
ability  Pji  {j  i)  will  fail  to  decrease  is  if  this  probability  is 
already  zero.  By  allowing  all  of  the  |  to  increase  in  the  manner 
described  above,  we  can  clearly  obtain  in  general  a  situation  in 
which  each  of  the  misclassification  probabilities  is  either  de¬ 
creasing,  or  equal  to  zero. 

This  implies  that  the  hypervolume  under  the  ROC  hypersur¬ 
faces  of  the  observers  under  consideration  (however  we  chose 
to  define  their  decision  rules  for  operating  points  other  than 
those  described  above)  must  also  decrease  as  the  task  is  made 
“easier”  as  described  above.  To  see  this,  note  that  if  a  given 
observer  achieves  an  operating  point  P  on  its  ROC  hypersur¬ 
face,  it  cannot  achieve  another  point  P'  such  that  the  compo¬ 
nents  of  these  points  satisfy  P-  >  Ppl  <  i  <  N2  -  N)  (be¬ 
cause  such  an  observer  could  be  replaced  by  an  observer  which 
achieved  P  for  all  such  points  by  using  the  original  decision 
rule  for  the  point  P,  thereby  achieving  unambiguously  better 
performance  at  those  points).  Thus,  knowing  that  a  given  ob¬ 
server  achieves  an  operating  point  of  P  implies  that  that  ob¬ 
server’s  ROC  hypersurface  must  have  a  hypervolume  under  it 
of  no  greater  than  1  -  nf=i_A(l  -  Pi)',  as  the  (nonzero)  Pi  de¬ 
crease,  this  upper  limit  on  the  hypervolume  must  also  decrease 
to  zero.  This  point  is  illustrated  in  Fig.  1  for  the  two-class  case; 
here  the  observer’s  false-negative  fraction,  FNFo,  corresponds 
to  P21.  and  the  false-positive  fraction,  FPFo,  corresponds  to 
P12. 

To  summarize,  we  have  shown  that  the  known  operating  point 
of  our  simple  observer  will  move  closer  to  the  origin  for  arbi¬ 
trary  data  pdfs  as  those  pdfs  are  moved  further  apart  (i.e.,  as 
the  underlying  task  is  made  “easier”),  implying  that  the  hyper¬ 
volume  under  its  ROC  hypersurface  will  also  converge  to  zero. 
In  fact,  reasoning  as  above,  one  can  see  that  the  ideal  observer 
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will  also’  be  unable  to  achieve  operating  points  within  the  re¬ 
gion  P-  >  P{  (1  <  i  <  N2  -  N),  since  the  ideal  observer’s 
ROC  hypersurface  is  never  above  that  of  any  other  observer  at 
any  given  point  in  the  domain  of  the  ROC  space  [15].  The  hy¬ 
pervolume  under  the  ideal  observer’s  ROC  hypersurface  will, 
thus,  also  converge  to  zero  as  the  underlying  data  pdfs  are  moved 
apart. 

VI.  Conclusion 

In  V-class  classification  tasks  where  N  >  2,  it  can  be  shown 
that  the  hypervolume  under  the  ROC  hypersurface  of  both  the 
“guessing”  observer  and  the  “perfect”  observer  are  zero.  More 
importantly,  we  have  shown  in  each  of  these  performance  ex¬ 
tremes  that  the  convergence  to  zero  is  smooth  rather  than  discon¬ 
tinuous.  This  convergence  can  be  considered  completely  gen¬ 
eral  for  “near-guessing”  observers  and  generally  true  for  “near¬ 
perfect”  observers  which  follow  rational  decision  rules  (analo¬ 
gous  to  false-negative  fraction  and  false-positive  fraction  being 
monotonically  related  in  a  two-class  task);  that  is,  the  conclu¬ 
sions  appear  to  hold  true  for  arbitrary  underlying  data  pdfs. 

In  the  two-class  classification  task,  the  area  under  the  ROC 
curve  (AUC)  is  considered  a  useful  performance  metric  for  a  va¬ 
riety  of  reasons.  One  of  the  most  pleasing  and  straightforward 
of  these  is  the  simple  relationship  between  AUC  and  the  “sep¬ 
arability”  of  the  two  underlying  data  pdfs  (i.e.,  the  difficulty  of 
the  task).  Namely,  the  AUC  (with  the  two-class  ROC  defined  as 
a  plot  of  false-negative  fraction  versus  false-positive  fraction) 
of  a  “perfect”  observer  is  zero,  and  increases  in  some  sense  uni¬ 
formly  as  the  task  is  made  more  difficult,  until  one  arrives  at  the 
“guessing”  observer  with  an  AUC  of  0.5.  In  an  iV-class  classi¬ 
fication  task,  this  straightforward  relationship  appears  to  break 
down,  and  both  “perfect”  and  “guessing”  observers  yield  ROC 
hypersurfaces  with  zero  hypervolume.  It  would  appear  that,  due 
to  this  ambiguity,  hypervolume  under  the  ROC  hypersurface  of 
an  iV-class  observer  is  not  a  useful  performance  metric:  Does 
a  hypervolume  of  0.005  indicate  an  observer  faced  with  an  ex¬ 
ceptionally  difficult  or  exceptionally  easy  task?  One  hopes  that 
some  other  performance  metric  from  two-class  classification 
can  be  generalized  usefully  for  iV-class  classification;  perhaps 
a  quantity  which  is  equal  to  AUC  in  the  two-class  case  has  a 
generalization  which  is  not  equal  to  the  hypervolume,  but  can 
be  shown  to  be  of  use  for  other  reasons. 
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ABSTRACT 

Bayesian  artificial  neural  networks  (BANNs)  have  proven  useful  in  two-class  classification  tasks,  and  are  claimed 
to  provide  good  estimates  of  ideal-observer-related  decision  variables  (the  a  posteriori  class  membership  probabil¬ 
ities).  We  wish  to  apply  the  BANN  methodology  to  three-class  classification  tasks  for  computer-aided  diagnosis, 
but  we  currently  lack  a  fully  general  extension  of  two-class  receiver  operating  characteristic  (ROC)  analysis  to 
objectively  evaluate  three-class  BANN  performance.  It  is  well  known  that  “the  likelihood  ratio  of  the  likelihood 
ratio  is  the  likelihood  ratio.”  Based  on  this,  we  found  that  the  decision  variable  which  is  the  a  posteriori  class 
membership  probability  of  an  observational  data  vector  is  in  fact  equal  to  the  a  posteriori  class  membership 
probability  of  that  decision  variable.  Under  the  assumption  that  a  BANN  can  provide  good  estimates  of  these  a 
posteriori  probabilities,  a  second  BANN  trained  on  the  output  of  such  a  BANN  should  perform  very  similarly  to 
an  identity  function.  We  performed  a  two-class  and  a  three-class  simulation  study  to  test  this  hypothesis.  The 
mean  squared  error  (deviation  from  an  identity  function)  of  a  two-class  BANN  was  found  to  be  2.5  x  1CT4.  The 
mean  squared  error  of  the  first  component  of  the  output  of  a  three-class  BANN  was  found  to  be  2.8  x  10~4,  and 
that  of  its  second  component  was  found  to  be  3.8  x  10~4.  Although  we  currently  lack  a  fully  general  method  to 
objectively  evaluate  performance  in  a  three-class  classification  task,  circumstantial  evidence  suggests  that  two- 
and  three-class  BANNs  can  provide  good  estimates  of  ideal-observer-related  decision  variables. 

Keywords:  Bayesian  artificial  neural  networks,  ideal  observers,  three-class  classification 

1.  INTRODUCTION 

In  the  past,  computerized  methods  for  the  detection1-5  and  classification6-11  of  mammographic  mass  lesions  have 
been  investigated  at  the  University  of  Chicago.  The  classification  scheme  currently  analyzes  lesions  which  have 
been  manually  identified  by  a  radiologist.  We  are  attempting  to  develop  a  fully  automated  classification  scheme 
by  combining  the  existing  detection  and  classification  schemes;  we  have  argued  previously12  that  this  will  require 
a  three-class  classifier  to  account  for  the  presence  of  false-positive  (FP)  computer  detections,  in  addition  to  the 
malignant  and  benign  lesions,  in  the  output  of  the  detection  scheme. 

For  some  time  now  we  have  explored  the  use  of  Bayesian  artificial  neural  networks  (BANNs)  for  a  variety  of 
detection5, 13, 14  and  classification11  tasks  in  computer-aided  diagnosis  (CAD).  Our  motivation  for  investigating 
BANNs  is  based,  first,  on  our  theoretical  observation  that,  in  the  limit  of  infinite  training  data,  a  BANN  will 
yield  an  ideal  observer  decision  function  for  that  data  population;15  and  second,  on  empirical  observations 
that  even  given  a  finite  sample  of  training  data,  a  BANN  can  estimate  an  ideal  observer  decision  function 
reasonably  well,16  (We  note  that  the  BANN  implementation  we  are  using  is  that  of  MacKay,17  which  employs  a 
multivariate  normal  function  for  the  prior  distribution  on  the  network  weight  values.)  We  have  also  performed 
simulation  studies  showing  that  BANNs  can  accurately  estimate  ideal  observer  decision  variables  in  a  three-class 
classification  task.15  Moreover,  we  showed  recently  that  a  three-class  BANN  could  produce  decision  variables  for 
actual  mammographic  mass  lesion  feature  data,  and  that  these  decision  variables  are  related  to  two-class  BANN 
decision  variable  data  in  a  particular  way  consistent  with  a  theoretical  relationship  between  three-class  and  two- 
class  ideal  observer  decision  variables.12  We  consider  this  to  be  strong  circumstantial  evidence  for  the  ability 
of  a  BANN  to  estimate  three-class  ideal  observer  decision  variables,  though  we  currently  lack  a  fully  general 
method  for  evaluating  three-class  classifiers  (i.e.,  a  three-class  extension  to  receiver  operating  characteristic 
(ROC)  analysis). 
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In  this  work,  we  present  further  circumstantial  evidence  toward  the  claim  that  a  BANN  can  provide  good 
estimates  of  three-class  ideal  observer  decision  variables.  We  develop  a  theoretical  relationship  between  the 
a  posteriori  class  membership  probabilities  of  a  given  observational  data  variable  and  the  a  posteriori  class 
membership  probabilities  of  those  a  posteriori  probabilities  treated  as  a  set  of  observational  data  in  their  own 
right.  (It  is  known  that  a  posteriori  class  membership  probabilities  are  equivalent  to  ideal  observer  decision 
variables  in  a  two-class  task,16  and  related  in  a  straightforward  way  to  the  ideal  observer  decision  variables  in  a 
task  with  three  or  more  classes.15)  We  then  describe  simulation  studies  to  train  and  test  a  set  of  BANNs,  and 
present  results  of  such  a  simulation  study  verifying  that  the  BANNs  we  examined  did  indeed  obey  the  theoretical 
relationship  predicted  for  ideal  observer  decision  variables,  to  within  experimental  error.  In  the  final  section,  we 
present  our  conclusions  drawn  from  this  work. 


2.  THEORY 

It  is  well  known  that  the  ideal  observer  decision  variable,  i.e.,  the  likelihood  ratio  or  any  monotonic  transformation 
of  this  value,  yields  optimal  performance  in  a  two-class  classification  task.18  It  can  also  be  shown,  in  a  classification 
task  with  N  classes  (N  >  2),  that  the  ideal  observer  decision  rule  becomes  more  complicated  than  a  simple 
threshold  on  a  single  decision  variable,  but  that  the  optimal  decision  variables  remain  a  set  of  N  —  1  likelihood 

ratios.18’19 

We  can  define  the  ith  likelihood  ratio  as 


A  i  =  LRi(x)  = 


p(x|7rjv)’ 


(1) 


where  x  represents  statistically  variable  observational  data  (which  we  assume  to  have  dimensionality  n),  and 
7 Tj  represents  one  of  the  N  classes  from  which  the  data  are  drawn  (here  1  <  i  <  N  -  1).  Clearly  the  vector 
(of  dimensionality  N  —  1)  of  decision  variables  A i  is  itself  statistically  variable,  and  one  might  ask  what  the 
likelihood  ratios  of  these  variables  are.  In  fact,20 

=  H^v %&*”■■■*• 

=  /•  •  •/  E  qfdyr dI" ' ' ' 1 (2) 

where  we  have  assumed  that  N  -  1  <  n;  if  N  —  1  =  n,  then  no  integration  is  performed.  (If  N  —  1  >  n,  then 
at  least  one  of  the  likelihood  ratio  decision  variables  will  be  expressible  as  a  function  of  the  others;  we  will  not 
consider  this  degenerate  case  here.)  The  sum  is  over  all  solutions  to  Eq.  1  for  a  given  A;  this  yields 
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LRi(A)  =  A^ 
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the  source  of  the  well-known  adage  that  “the  likelihood  ratio  of  the  likelihood  ratio  is  the  likelihood  ratio.” 

Consider  now  a  different  set  of  decision  variables,  the  a  posteriori  class  membership  probabilities  considered 
as  functions  of  the  statistically  variable  observational  data 


y*  =  P(*i  |x). 


(4) 


(Since  P(itn\x)  =  1  —  1  P(7i*|a:),  we  still  have  N  —  l  decision  variables.)  Note  that  in  a  two-class  classification 

task,  this  decision  variable  is  known  to  be  a  monotonic  function  of  the  likelihood  ratio,  and  is  therefore  an  ideal 
observer  decision  variable;16  while  in  a  classification  task  with  more  than  two  classes,  the  a  posteriori  class 
membership  probabilities  can  be  shown  to  be  related  to  the  likelihood  ratios  in  a  straightforward  way.15 

Reasoning  as  above,  we  may  ask  what  the  a  posteriori  class  membership  probability  of  these  decision  variables, 
or  P(7Tj|y),  is.  In  fact, 
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and  this  relation  can  also  be  inverted  to  yield 


LRi(f)  = 
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We  again  start  with  Eq.  2,  this  time  obtaining 
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where  the  sums  in  j  are  over  all  solutions  to  Eq.  4  for  a  given  y.  (The  fraction  can  be  taken  out  of  the  integral 
because  the  relations  in  Eqs.  5  and  6  are  one-to-one,  and  thus  the  set  of  all  solutions  to  Eq.  4  correspond  to  a 
single  value  of  LRi(x}).)  This  again  yields 


LR i{y)  =  LRi(^)  (8) 

where  y  is  the  vector  of  a  posteriori  class  membership  probabilities  of  x  from  Eq.  4,  and  Xj  is  any  solution  to 
that  equation  for  a  given  y. 

It  follows  that 

LRi(y)P(7Ti)/P(7rjy) 
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where  Xj  is  again  any  solution  to  Eq.  4  for  a  given  y.  This  shows  that  a  similar  adage  to  that  for  likelihood  ratios 
holds  true,  namely  that  “the  a  posteriori  class  probabilities  of  the  (data)  a  posteriori  class  probabilities  are  the 
(data)  a  posteriori  class  probabilities.” 


3.  MATERIALS  AND  METHOD 

We  have  shown  in  the  past16  that  a  BANN  can  provide  good  estimates  of  the  a  posteriori  class  membership 
probabilities  in  a  two-class  classification  task,  and  we  have  presented  the  results  of  simulation  studies15  and 
experiments  with  real  mammographic  feature  data12  strongly  suggesting  that  the  same  holds  true  for  three-class 
BANNs  as  well.  The  theoretical  relationship  given  by  Eq.  9,  derived  in  the  preceding  section,  provides  a  basis 
for  another  simulation  study  which  should  provide  further  circumstantial  evidence  for  the  claim  that  two-class 
and  three-class  BANNs  can  provide  good  estimates  of  the  two-  and  three-class  a  posteriori  class  membership 
probabilities  (directly  related  to  the  ideal  observer  decision  variables  via  Eq.  5),  respectively. 

Specifically,  for  the  two-class  simulation  study,  we  drew  500  samples  pseudorandomly  from  each  of  two 
distributions: 

p(x\m)  =  N(x;pi  =  1,  erf  =  2)  (10) 

P(x  |7T2)  =  N{x\p2  =  0,02  =  !)•  (11) 

We  then  trained  a  two-class  BANN  with  one  input,  five  hidden  units,  and  one  output  on  this  data,  obtaining  a 

classifier  we  denote  by 

y  =  B?(*).  (12) 

(The  superscript  denotes  the  number  of  classes  being  classified.)  We  then  used  this  output,  given  the  known 
truth  states  for  the  original  observations  x  from  which  it  was  obtained,  as  training  data  for  a  second  BANN  with 
one  input,  five  hidden  units,  and  one  output: 

2  =  B&y).  (13) 

Finally,  we  pseudorandomly  sampled  an  independent  testing  set  of  500  observations  x  from  each  of  the  two 
classes  given  in  Eqs.  10  and  11.  This  testing  set  was  used  as  input  to  the  first  BANN  to  obtain  a  testing  set 
ytest;  this  in  turn  was  given  as  input  to  the  second  BANN,  for  which  the  output  was  ztest. 

Given  Eq.  9,  together  with  the  assumption  that  an  adequately  trained  two-class  BANN  yields  good  estimates 
of  the  a  posteriori  class  membership  probabilities  of  the  observations  being  classified,  it  should  be  the  case  that 
ztest  estimates  ytest  at  least  to  within  experimental  error.  To  verify  this,  we  plotted  ztest  as  a  function  of  ytest 


for  each  of  the  two  classes,  and  we  computed  the  mean  squared  error 

MSE2  =  I^o^test_2/test)2’  (14) 

where  the  sum  is  over  all  the  observations  in  the  two  classes. 

Similarly,  for  the  three-class  simulation  study,  we  drew  500  two-dimensional  samples  pseudorandomly  from 
each  of  three  distributions: 

v  _  f  1  1  „  4  .75  x  2  ]\  ,1C> 

p(*|tti)  =  N\x-,m=  0  ,Ei=  75x2  1  1  (15) 

p(x\n2)  =  N(x-,jl2=  °  ’E2=  -.4x1.5  )  (16) 

p(£|7T3)  =  n(x;P 3=  q  ,E3=  o  1  )  (17) 

We  then  trained  a  three-class  BANN  with  two  inputs,  five  hidden  units,  and  two  outputs  on  this  data,  obtaining 
a  classifier  we  denote  by 

y  =  BUx).  (18) 


We  then  used  this  output,  given  the  known  truth  states  for  the  original  observations  x  from  which  it  was  obtained, 
as  training  data  for  a  second  BANN  with  two  inputs,  five  hidden  units,  and  two  outputs: 

z  = 


(19) 


Figure  1.  Output  of  the  second  two-class  BANN  as  a  function  of  its  input  for  the  observations  actually  drawn  from  class 
7Ti  in  the  two-class  simulation  study. 

Finally,  we  pseudorandomly  sampled  an  independent  testing  set  of  500  observations  x  from  each  of  the  three 
classes  given  in  Eqs.  15-17.  This  testing  set  was  used  as  input  to  the  first  BANN  to  obtain  a  testing  set  y  test; 
this  in  turn  was  given  as  input  to  the  second  BANN,  for  which  the  output  was  z  test. 

Again,  given  Eq.  9,  together  with  the  assumption  that  an  adequately  trained  two-class  BANN  yields  good 
estimates  of  the  a  posteriori  class  membership  probabilities  of  the  observations  being  classified,  it  should  be  the 
case  that  ^‘est  estimates  y]est,  and  ^2®st  estimates  y2est,  at  least  to  within  experimental  error.  To  verify  this,  we 
plotted  2*0st  as  a  function  of  y\est,  and  ^2est  as  a  function  of  ylest,  for  each  of  the  three  classes,  and  we  computed 
the  mean  squared  errors 

MSE3.~E(4est-rt)2,  (20) 

{i :  1, 2},  where  the  sum  is  over  all  the  observations  in  the  three  classes. 

4.  RESULTS 

Figure  1  shows  ztest  as  a  function  of  ytest  for  the  observations  in  class  7Ti,  and  Fig.  2  shows  ztest  as  a  function 
of  ytest  for  the  observations  in  class  7T2  from  the  two-class  simulation  study.  The  mean  squared  error  for  the 
complete  set  of  1000  observations  was  2.5  x  10-4. 

Figure  3  shows  the  components  of  z  test  as  a  function  of  the  corresponding  components  of  y  test  for  the 
observations  in  class  7Ti.  Similarly  Fig.  4  shows  the  components  of  z  test  as  a  function  of  the  corresponding 
components  of  y  test  for  the  observations  in  class  7T2,  and  Fig.  5  shows  the  components  of  z  test  as  a  function  of 
the  corresponding  components  of  y  test  for  the  observations  in  class  773.  The  mean  squared  error  for  the  complete 
set  of  1500  observations  was  2.8  x  10-4  for  the  first  component  and  3.8  x  10-4  for  the  second  component. 

5.  DISCUSSION  AND  CONCLUSIONS 

We  developed  a  theoretical  relationship  between  the  a  posteriori  class  membership  probabilities,  directly  related 
to  ideal  observer  decision  variables,  and  the  a  posteriori  class  membership  probabilities  of  those  a  posteriori 
class  membership  probabilities  treated  as  statistically  variable  observer  data  in  their  own  right.  The  identity 
relationship  found  is,  perhaps  unsurprisingly,  quite  similar  in  spirit  to  the  identity  relationship  between  the 
likelihood  ratio  decision  variables  and  the  likelihood  ratio  of  those  likelihood  ratio  decision  variables  for  a  given 
task. 


Figure  3.  The  (a)  first  and  (b)  second  components  of  the  output  of  the  second  three-class  BANN  as  a  function  of  the 
corresponding  component  of  its  input  for  the  observations  actually  drawn  from  class  7Ti  in  the  three-class  simulation  study. 


(a) 


(b) 


Figure  4.  The  (a)  first  and  (b)  second  components  of  the  output  of  the  second  three-class  BANN  as  a  function  of  the 
corresponding  component  of  its  input  for  the  observations  actually  drawn  from  class  7T2  in  the  three-class  simulation  study. 


Figure  5.  The  (a)  first  and  (b)  second  components  of  the  output  of  the  second  three-class  BANN  as  a  function  of  the 
corresponding  component  of  its  input  for  the  observations  actually  drawn  from  class  1T3  in  the  three-class  simulation  study. 


We  currently  lack  a  fully  general  method  for  three-class  classification  or  for  practically  evaluating  the  perfor¬ 
mance  of  a  three-class  classifier.  As  a  first  step  toward  such  a  classification  method,  we  are  investigating  the  use 
of  BANNs  to  estimate  three-class  ideal  observer  decision  variables  for  such  a  task.  Since,  in  a  practical  situation, 
we  will  not  have  access  to  the  underlying  probability  distributions  from  which  the  observational  data  are  drawn, 
we  must  rely  on  circumstantial  evidence  in  support  of  our  claim  that  a  three-class  BANN  can  adequately  estimate 
decision  variables  directly  related  to  ideal  observer  decision  variables. 

Previously,  we  presented  work  relating  the  output  of  a  three-class  BANN  to  the  outputs  of  two-class  BANNs 
trained  for  various  “simplified”  cases  in  which  the  three-class  classification  task  was  reduced  to  a  two-class 
classification  task,  and  showed  that  the  relationships  found  were  consistent  with  the  relationship  between  three- 
and  two-class  ideal  observers  for  the  same  tasks.12  In  the  present  work,  we  showed  that  the  output  of  two-  and 
three-class  BANNs  was  consistent,  to  within  experimental  error,  with  the  theoretical  relationship  developed  for 
actual  a  posteriori  class  membership  probabilities.  This  is  of  limited  practical  use  in  the  complete  development  of 
a  three-class  classifier,  mainly  because  the  three-class  ideal  observer  decision  rule  is  considerably  more  complicated 
than  its  two-class  counterpart  (a  simple  threshold  on  a  single  decision  variable).  It  does,  however,  bolster  our 
confidence  in  the  choice  of  the  BANN  as  an  appropriate  tool  for  estimating  the  decision  variables  which  would 
eventually  be  incorporated  in  such  a  classifier. 
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ABSTRACT 

We  analyzed  a  variety  of  recently  proposed  decision  rules  for  three-class  classification  from  the  point  of  view  of 
ideal  observer  decision  theory.  We  considered  three-class  decision  rules  which  have  been  proposed  recently:  one 
by  Scurfield,  one  by  Chan  etal.,  and  one  by  Mossman.  Scurfield’s  decision  rule  can  be  shown  to  be  a  special 
case  of  the  three-class  ideal  observer  decision  rule  in  two  different  situations:  when  the  pair  of  decision  variables 
is  the  pair  of  likelihood  ratios  used  by  the  ideal  observer,  and  when  the  pair  of  decision  variables  is  the  pair  of 
logarithms  of  the  likelihood  ratios.  Chan  et  al.  start  with  an  ideal  observer  model,  where  two  of  the  decision 
lines  used  by  the  ideal  observer  overlap,  and  the  third  line  becomes  undefined.  Finally,  we  showed  that  the 
Mossman  decision  rule  (in  which  a  single  decision  line  separates  one  class  from  the  other  two,  while  a  second  line 
separates  those  two  classes)  cannot  be  a  special  case  of  the  ideal  observer  decision  rule.  Despite  the  considerable 
difficulties  presented  by  the  three-class  classification  task  compared  with  two-class  classification,  we  found  that 
the  three-class  ideal  observer  provides  a  useful  framework  for  analyzing  a  wide  variety  of  three-class  decision 
strategies. 

Keywords:  ROC  analysis,  three-class  classification,  ideal  observer  decision  rules 

1.  INTRODUCTION 

We  are  attempting  to  develop  a  fully  automated  mass  lesion  classification  scheme  for  computer-aided  diagnosis 
(CAD)  in  mammography.  This  scheme  will  combine  two  schemes  developed  at  the  University  of  Chicago:  one  for 
automatically  detecting  mass  lesions  in  mammograms,1-5  and  one  for  classifying  known  lesions  as  malignant  or 
benign.6-10  Combining  these  two  types  of  CAD  scheme  is  inherently  difficult,  because  the  output  of  the  detection 
scheme  will  necessarily  include  false-positive  (FP)  computer  detections  in  addition  to  the  malignant  and  benign 
lesions  to  be  classified.  These  FP  computer  detections  correspond  to  objects  which  were  by  design  not  included 
in  the  training  sample  of  the  classification  scheme,  because  they  are  not  members  of  the  data  population  (benign 
and  malignant  mass  breast  lesions)  for  which  the  classification  scheme  was  created.  It  is  clear  then  that  the 
detection  scheme’s  output  cannot  be  used  unmodified  as  the  input  to  the  classification  scheme. 

Our  approach  has  been  to  treat  this  problem  explicitly  as  a  three-class  classification  task.  That  is,  the 
outputs  of  the  detection  scheme  should  be  classified  as  malignant  lesions,  benign  lesions,  and  non-lesions  (FP 
computer  detections),  and  the  classifier  to  be  estimated  is  the  ideal  observer  decision  rule  for  this  task.  Such 
an  approach  presents  considerable  difficulties  of  its  own.  On  the  one  hand,  decision  rules,  in  particular  ideal 
observer  decision  rules,  increase  rapidly  in  complexity  with  the  number  of  classes  involved.  On  the  other  hand,  a 
fully  general  performance  evaluation  method,  such  as  a  three-class  extension  of  receiver  operating  characteristic 
(ROC)  analysis,  has  yet  to  be  developed. 

The  explicit  form  of  the  ideal  observer  in  a  three-class  classification  task  has  been  known  for  some  time.11 
For  the  reasons  just  stated,  however,  a  practical  method  for  estimating  and  evaluating  observer  performance 
based  on  an  ideal  observer  model  has  proven  elusive,  despite  the  success  of  the  two-class  binormal  ideal  observer 
model.12  Nevertheless,  pragmatic  observer  decision  rule  models  for  three-class  classification  tasks  have  been 
proposed  relatively  recently  by  several  groups  of  researchers.  In  some  cases,  these  models  are  motivated  more 
by  considerations  of  tractability  than  of  complete  generality.  This  is  of  course  understandable  given  the  inherent 
difficulties  of  three-class  classification;  however,  we  thought  it  might  be  of  interest  to  analyze  a  number  of  recently 
proposed  three-class  decision  rule  models  within  an  ideal  observer  decision  rule  framework. 
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In  the  next  section,  we  review  the  three-class  ideal  observer  decision  rule.  In  the  following  three  sections,  we 
review  recently  proposed  three-class  decision  rule  models:  one  by  Scurfield,13  one  by  Chan  etal.,14  and  one  by 
Mossman.15  In  each  case,  the  given  decision  rule  is  analyzed  in  terms  of  the  ideal  observer  decision  rule;  where 
necessary  or  expedient,  assumptions  are  made  about  the  observer’s  decision  variables  in  order  to  facilitate  this 
analysis.  We  emphasize  that  we  do  not  attempt  a  review  of  the  experimental  methods  in  the  works  discussed; 
we  are  specifically  interested  only  in  the  form  of  the  decision  rule  which  serves  as  the  starting  point  for  each 
work.  The  results  of  our  analyses  are  briefly  summarized  in  Sec.  6. 


2.  THE  THREE-CLASS  IDEAL  OBSERVER 

It  can  be  shown11,16  that  an  /V-class  ideal  observer  makes  decisions  regarding  statistically  variable  observations 
x  by  partitioning  a  likelihood  ratio  decision  variable  space,  where  the  boundaries  of  the  partitions  are  given  by 
hyperplanes: 
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Here  Ui\j  is  the  utility  of  deciding  an  observation  is  from  class  77  given  that  it  is  actually  from  class  Ttj,  and  the 
N  —  1  likelihood  ratios  are  defined  as 


TR  =  =  7Tj) 

Ul-px-(f|t  =  7Tw) 


(3) 


for  i  <  N.  We  also  define  the  actual  class  (the  “truth”)  to  which  an  observation  belongs  as  t,  and  the  class  to 
which  it  is  assigned  (the  “decision” )  as  d,  where  t  and  d  can  take  on  any  of  the  values  , . . . ,  7Tj, . . . ,  ,  the 

labels  of  the  various  classes.  (We  use  boldface  type  to  denote  statistically  variable  quantities.) 

The  partitioning  of  the  decision  variable  space  is  determined  by  the  parameters 


lijk  —  {Ui\k  Uj\k)P(t  —  7Tfc),  (4) 

with  i,  j,  and  k  varying  from  1  to  N,  and  j  ^  i.  Note  that  these  parameters  are  not  independent,  however, 
because 

'Yijk  ~  Ikjk  7fcifc-  (3) 

We  can  impose  the  reasonable  condition  that  the  utility  for  correctly  classifying  an  observation  from  a  given 
class  should  be  greater  than  any  utility  for  incorrectly  classifying  an  observation  from  the  same  class,  i.e., 
Um  >  Uj\i  {i  j}.  This  gives,  for  j  ±  i, 

Hji  >  0,  (6) 

leaving  N(N  —  1)  parameters  (the  rest  are  derivable  from  Eq.  5). 

Finally,  note  that  the  hyperplanes  represented  by  Eqs.  1  and  2  are  unchanged  if  we  multiply  all  of  these 
equations  by  a  single  scalar,  such  as  1/CCi^j  liji)-  This  leaves  us  with  N2  —  N  -  1  degrees  of  freedom,  as 
expected. 

The  behavior  of  a  three-class  ideal  observer  is  completely  determined  by  the  three  decision  boundary  lines 

7121LR1  —  72I2LR2  =  7313  —  7323  (7) 

7131LR1  +  (7232  -  7212)LR2  =  7313  (8) 

(7131  —  7121  )LRi  +  7232LR2  =  7323)  (9) 


Figure  1.  Example  three-class  ideal  observer  decision  rule,  given  the  values  of  the  decision  parameters  7121  =  7212  =  3/14 
and  7131  =  7313  =  7232  =  7323  =  1/7.  Note  7^  =  {U^  -  Uj^)P(t  =  wk). 


which  we  call,  respectively,  the  “l-vs.-2"  line,  the  “l-us.-3”  line,  and  the  “2-US.-3”  line.  Note  that  if  any  two 
of  these  lines  intersect,  the  third  line  must  also  share  this  intersection  point.  We  also  emphasize  the  simple 
interpretation,  from  Eq.  4,  of  each  of  the  7^  parameters  appearing  in  these  decision  boundary  line  equations 
as  the  difference  in  utilities  between  a  “correct”  and  one  particular  “incorrect”  decision  (scaled  by  the  a  priori 
probability  of  the  true  class  in  question);  and  of  each  difference  in  the  7 parameters  as  a  difference  in  utilities 
between  two  possible  “incorrect”  decisions  (again  scaled  by  the  a  priori  probability  of  the  true  class  in  question). 

An  example  ideal  observer  decision  rule  for  particular  values  of  the  utilities  Uoj,  and  hence  of  the  parameters 
7 iji,  is  shown  in  Fig.  1.  Here  we  have  chosen  7121  =  7212  =  3/14  and  7131  =  7313  =  7232  =  7323  =  1/7,  yielding 
the  decision  boundary  lines 


nLR'- 

-TiLR> 

=  0 

{“1-W.-2"} 

(10) 

^lrx- 

-nLR* 

1 

7 

{  “1-US.-3”  } 

(11) 

+  ^lr2 

1 

7 

{“2-i;s.-3”}. 

(12) 

These  simplify  to  the  equations  LR2  =  LRi,  LR2  =  2LRX  -  2,  and  LR2  =  LRx/2  +  1,  respectively. 

3.  THE  SCURFIELD  DECISION  RULE 

Scurfield  investigated  a  decision  rule  applied  to  two-dimensional  statistically  variable  data  (y  =  (yi,y2))  drawn 
from  three  classes.13  The  application  domain  was  human  observer  performance  modeling  for  acoustical  psy¬ 
chophysics  experiments.  (In  prior  work,  Scurfield  investigated  a  decision  rule  for  three-class  classification  of 
univariate  data.17  We  will  not  review  that  prior  work  here,  because  at  present  we  are  interested  in  relating  given 
observer  models  to  the  three-class  ideal  observer  model  for  multivariate  observational  data,  which  yield  two- 
dimensional  decision  variable  data  by  Eq.  3.)  In  Scurfield’s  work,  no  assumptions  are  made  about  the  decision 
variables  yx  and  y2;  in  particular,  these  decision  variables  are  not  assumed  to  be  related  in  any  way  to  an  ideal 
observer  model.  This  is  entirely  appropriate  given  the  nature  of  the  problem  domain  Scurfield  investigated  — 
i.e.,  human  observer  performance  modeling.  It  can  readily  be  shown,  however,  that  if  one  chooses  to  make  such 
assumptions,  special  cases  of  the  Scurfield  model  are  in  fact  special  cases  of  an  ideal  observer  decision  rule. 


Figure  2.  Decision  rule  investigated  by  Scurfield,  for  the  decision  parameters  71  and  72. 


The  Scurfield  decision  rule  is  dependent  on  two  decision  parameters,  which  we  will  call  71  and  72.  The 
decision  rule  can  be  written  as 


decide 

d  =  7Ti 

iff 

2/i 

-  2/2  >  71 

-  72 

and 

2/1 

> 

71; 

(13) 

decide 

d  =  7T2 

iff 

2/1 

-  2/2  <  7i 

-72 

and 

2/2 

> 

72; 

(14) 

decide 

d  =  7T3 

iff 

Vi  <  7i 

and 

2/2 

< 

72- 

(15) 

This  decision  rule  is  illustrated  in 

Fig.  2. 

From  these  relations,  one  can 

define  the  decision  boundary  lines 

2/i  - 

2/2 

= 

7i  -  72 

{“1-ws.- 

-2”} 

(16) 

yi 

= 

71 

{“1-w, 

-3”} 

(17) 

2/2 

- 

72 

{“2  -vs. 

-3”}. 

(18) 

Note  the  similarity  in  form  between  these  equations  and  Eqs.  7-9.  If  we  choose  yx  =  LRi(x)  and  y2  =  LR,2(x) 
for  some  set  of  observational  data  x,  we  have  a  special  case  of  Eqs.  7-9,  which  is  illustrated  in  Fig.  3. 

A  second  correspondence  between  Scurfield’s  decision  rule  and  the  ideal  observer  decision  rule  can  be  obtained 
by  taking  yx  =  log(LRi(x))  and  y2  =  log(LR2(x));  note  that  a  line  of  the  form  log(LR2)  =  log(LRi)  +  a 
corresponds  to  a  line  of  the  form  LR2  =  /3LRi  for  appropriate  constants  a  and  /?.  By  inspection,  this  is  again  a 
special  case  of  Eqs.  7-9,  which  is  illustrated  in  Fig.  4. 

Scurfield  points  out13  that  the  observer  which  maximizes  Pc,  the  “percent  correct”  or  probability  of  a 
correct  response,  is  a  special  case  of  the  ideal  observer  ( i.e .,  a  single  operating  point  achievable  by  the  ideal 
observer  for  the  given  task).  This  observer  follows  the  Scurfield  decision  rule  model  with  yx  =  log(LRi(x))  and 
y2  =  log(LR2(x)),  and  decision  parameters  given  by  e71  =  P(7T3)/P(7Ti)  and  e72  =  P(7r3)/P(7r2).  It  is  interesting 
to  note  that  the  Scurfield  decision  rule  model  can  in  fact  be  used  to  describe  ideal  observer  performance  for  an 
even  wider  class  of  operating  points,  as  shown  in  this  section. 

4.  THE  CHAN  DECISION  RULE 

Chan  etal,  are  investigating  three-class  classifiers  for  computer-aided  diagnosis.14  Their  work  is  motivated  by 
reasoning  similar  in  principle  to  that  which  we  independently  arrived  at  when  we  began  to  consider  this  problem. 
In  particular,  they  consider  a  clinical  situation  in  which  observations  must  be  classified  as  malignant,  benign, 


Figure  3.  A  special  case  of  the  ideal  observer  decision  rule,  which  is  a  special  case  of  the  Scurfield  decision  rule  with 
yx  =  LRi(x)  and  y2  =  LR2(x). 


Figure  4.  A  special  case  of  the  ideal  observer  decision  rule  which  is  a  special  case  of  the  Scurfield  decision  rule  with 
Yi  =  log(LRi(x))  and  y2  =  log(LR2(x)). 


Figure  5.  The  decision  rule  investigated  by  Chan  etal,  which  as  they  state  is  a  special  case  of  the  ideal  observer  decision 
rule.  Observations  in  the  unlabelled  region  are  decided  “not  ^3”,  i.e.,  either  “tti”  or  “1x2” ■ 


or  normal.  Because  the  goal  of  their  work  is  to  optimize  the  performance  of  a  system  to  aid  a  radiologist  or 
clinician,  rather  than  to  measure  the  psychophysical  performance  of  an  existing  observer,  they  choose  to  start 
explicitly  from  an  ideal  observer  model  in  constructing  their  decision  rule. 

In  order  to  reduce  the  complexity  of  the  ideal  observer  decision  rule  to  manageable  proportions,  Chan  et  al. 
impose  restrictions  on  the  utilities  used  by  their  observer.  In  their  formulation,  the  class  we  are  labelling  -k\  is 
the  benign  class;  1x2,  the  normal  class;  and  the  malignant  class  is  7r 3.  They  further  assume  that  the  possible 
values  of  any  utility  Uqj  are  restricted  to  the  interval  [0, 1],  They  then  set  Ui\\  =  U2\2  =  U3\3  =  1  (*-e.,  correctly 
identifying  any  case  has  maximal  utility).  Furthermore,  they  require  C/2|  1  =  Uiyz  =  1  and  U\\3  =  U2\3  =  0 
(i.e.,  misidentifying  a  benign  case  as  normal,  or  vice  versa,  has  no  significant  cost  reducing  the  utility  of  such  a 
decision  from  the  maximum,  but  misclassifying  an  actually  malignant  case  as  benign  or  normal  has  the  minimum 
possible  utility).  Finally,  C/311,  and  U3\2  are  assumed  to  have  arbitrary  values  on  the  open  interval  (0,1)  (i.e., 
misclassifying  an  actually  non-malignant  case  as  malignant  will  have  some  cost  reducing  the  utility  of  such 
a  decision  from  the  maximum,  but  such  a  misclassification  is  in  some  sense  “better”  than  missing  an  actual 
malignancy).  It  is  important  to  note  that  these  assumptions  are  arguably  relevant  to  a  reasonable  model  of  a 
clinical  situation,  and  are  thus  of  interest  beyond  their  superficial  advantage  in  reducing  the  degrees  of  freedom 
involved  in  the  observer’s  decision  rule.  We  will,  however,  only  consider  the  latter  issue  in  the  remainder  of  this 
section. 

Substituting  the  values  of  the  utilities  given  above  into  Eq.  4,  we  obtain  decision  boundary  lines  of  the  form 


0  LRi  -|-  0  LR2 

(l-U3ll)P(t  =  m)LR^  +  (1  -  t73|2)P(t  =  tt2) 

(l-U3ii)P(t  =  *i)  +  (l-U3j2)P(t  =  n2)LR^ 

a  a 


0 

P(  t  =  7i-3) 

a 

P(  t  =  7T3) 
a 


{“1-WS.-2”} 

{“1-VS.-3"} 

{“2-VS.-3”} 


(19) 

(20) 

(21) 


where  a  =  1  +  P(t  =  7r3)  —  U3\iP(t  —  7Ti)  —  U3\2P(t  =  n2).  Note  that,  as  Chan  etal.  point  out,  the  “1-US.-2” 
line  is  in  fact  undefined  for  this  choice  of  utilities,  while  the  “l-us.-3”  and  “2-US.-3”  lines  are  identical.  This  is  a 
general  consequence  of  Eqs.  7-9;  if  any  two  of  these  equations  yield  identical  lines,  the  third  line  must  be  either 
identical  to  them  or  undefined.  The  decision  rule  considered  by  Chan  etal.  is  illustrated  in  Fig.  5. 


Figure  6.  Decision  rule  investigated  by  Mossman,  for  the  decision  parameters  a  and  /3,  shown  in  the  a  posteriori  class 
probability  space. 


5.  THE  MOSSMAN  DECISION  RULE 

Mossman  investigates  a  decision  rule  applied  to  a  set  of  three  decision  variables  yx,  y2,  and  y3,  subject  to  the 
constraint 

yi+y2  +  y3  =  l,  (22) 

as  well  as  0  <  y^  <  1  {1  <  i  <  3}.  This  is  consistent  with  the  constraint  on  the  a  posteriori  class  probabilities, 
P(7Ti|x)  +  P(ir2  |x)  +  P(7r3|x)  =  1;  these  quantities  are  known  to  be  directly  related  to  the  likelihood  ratio  ideal 
observer  decision  variables.18, 19  (In  this  section  we  will  write  P{i\i\x)  instead  of  P(t  =  Tti\x)  for  simplicity.) 
Mossman  does  not  explicitly  require,  however,  that  the  decision  variables  in  Eq.  22  be  the  a  posteriori  class 
probabilities  ( e.g .,  they  may  be  noisy  estimates  of  these  quantities). 

The  decision  rule  considered  by  Mossman,  which  depends  on  two  decision  parameters  a  and  p,  is 

decide  d  =  iff  U2  -  Vi  <  P  and  7/3  <  a;  (23) 

decide  d  =  7r2  iff  2/2  -  2/1  >  P  and  t/3  <  a;  (24) 

decide  d  =  7r3  iff  y3  >  a.  (25) 

where  0  <  a  <  1  and  —  1  <  p  <  1.  From  these  relations,  and  given  the  relation  2/3  =  1  —  2/i  —  2/2  from  Eq.  22, 
one  can  define  the  decision  boundary  lines 

2/1  -2/2  =  ~P  { “l-vs.-2” }  (26) 

2/i +  2/2  =  \  —  a  {“1-W.-3”}  (27) 

yi+2/2  =  l-a  {“2-vs.-3” }.  (28) 

This  decision  rule  is  illustrated  in  Fig.  6.  Note  that,  similar  to  the  Chan  etal.  decision  rule,  the  “l-us.-3”  and 
“2-vs.-3”  decision  boundary  lines  are  identical. 

We  now  consider  a  special  case  of  the  Mossman  decision  rule  in  which  yx  =  P(7Ti|x),  y2  =  P(7r2|x),  and 
y3  =  P(7r3|x)  for  some  observational  data  vector  x.  This  version  of  the  decision  rule  is  illustrated  in  Fig.  7. 

Although  the  Mossman  decision  rule  appears  similar  in  form  to  the  ideal  observer  decision  rule,  recall  from 
Sec.  4  that  if  two  of  the  decision  boundary  line  equations  are  identical,  the  third  must  yield  a  line  identical  to 


Figure  7.  Decision  rule  investigated  by  Mossman,  for  the  decision  parameters  a  and  0,  shown  in  likelihood  ratio  space. 

the  first  two  or  be  undefined.  Another  way  to  see  this  is  to  note  that  the  coefficients  of  Eq.  9  are  differences  of 
the  corresponding  coefficients  of  Eqs.  7  and  8.  If  the  coefficients  of  Eqs.  8  and  9  are  identical,  it  must  be  the  case 
that  the  coefficients  of  Eq.  7  are  all  zero.  For  the  Mossman  decision  rule,  this  would  require  l  +  /3  =  0,  1-/3  =  0, 
and  (3  =  0  simultaneously,  which  is  clearly  impossible.  It  follows  that  the  decision  rule  considered  by  Mossman 
cannot  represent  possible  ideal  observer  performance  for  any  choice  of  the  utilities  U^j  in  Eqs.  1  and  2. 

6,  DISCUSSION  AND  CONCLUSIONS 

We  examined  three  decision  rules  proposed  recently  for  three-class  classification  tasks  by  different  researchers. 
The  basis  for  our  evaluation  was  ideal  observer  decision  theory,  primarily  because  our  own  interest  in  the  three- 
class  classification  task  is  its  possible  application  to  CAD. 

Although  this  is  not  the  most  general  approach  to  three-class  classification,  the  three-class  classification  task 
is  difficult  enough  that  it  is  perhaps  worth  making  any  attempt  to  analyze,  from  a  single  point  of  view,  the  work 
of  the  relatively  few  researchers  investigating  this  problem. 

In  particular,  Scurfield  points  out13  that  his  proposed  decision  rule  is  in  fact  an  ideal  observer  decision  rule 
for  a  single  ideal  observer  operating  point,  namely  the  observer  which  maximizes  the  probability  of  any  correct 
response  (or  “percent  correct”  or  Pc).  We  were  able  to  show  that,  under  various  assumptions,  a  larger  set  of 
such  correspondences  between  the  Scurfield  observer  and  the  ideal  observer  exists. 

Chan  etal.  are  working  on  the  application  of  three-class  classification  to  CAD,  and  thus  explicitly  take 
the  ideal  observer  as  the  starting  point  in  the  development  of  their  decision  rule.14  Although  this  rendered  our 
analysis  of  that  decision  rule  in  terms  of  ideal  observer  decision  theory  largely  trivial,  it  provided  an  intuitive 
basis  for  understanding  the  results  of  similar  analysis  of  the  Mossman  decision  rule,  namely  the  conclusion  that 
the  latter  does  not  correspond  to  ideal  observer  behavior  for  any  possible  values  of  the  utilities  used  by  the  ideal 
observer.  However,  we  note  that  the  structure  of  the  Mossman  decision  rule  —  a  simple  sequence  of  thresholds 
on  single  decision  variables  —  may  indeed  serve  as  a  reasonable  model  for  human  observer  performance  in  certain 
situations,  e.g.,  differential  diagnosis. 
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Abstract 

We  analyze  recently  proposed  decision  rules  for  three-class  classification  from  the 
point  of  view  of  ideal  observer  decision  theory.  We  consider  three-class  decision 
rules  proposed  by  Scurfield,  by  Chan  et  al.,  and  by  Mossman.  Scurfield’s  decision 
rule  is  shown  to  be  a  special  case  of  the  three-class  ideal  observer  decision  rule  in 
three  different  situations.  Chan  et  al.  start  with  an  ideal  observer  model  and  specify 
its  decision-consquence  utility  structure  in  a  way  that  causes  two  of  the  decision 
lines  used  by  the  ideal  observer  to  overlap  and  the  third  line  to  become  undefined. 
Finally,  we  show  that  the  Mossman  decision  rule  cannot  be  a  special  case  of  the  ideal 
observer  decision  rule.  Despite  the  considerable  difficulties  presented  by  the  three- 
class  classification  task,  the  three-class  ideal  observer  provides  a  useful  framework 
for  analyzing  a  variety  of  three-class  decision  strategies. 

Key  words:  ROC  analysis,  three-class  classification,  ideal  observer  decision  rules 


1  Introduction 


We  are  attempting  to  develop  a  fully  automated  mass  lesion  classification 
scheme  for  computer-aided  diagnosis  (CAD)  in  mammography.  This  scheme 
will  combine  two  schemes  developed  at  the  University  of  Chicago:  one  for 
automatically  detecting  mass  lesions  in  mammograms  (Bick,  Giger,  Schmidt, 
Nishikawa,  Wolverton,  and  Doi,  1995;  Yin,  Giger,  Doi,  Metz,  Vyborny,  and 
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Schmidt,  1991;  Yin,  Giger,  Vyborny,  Doi,  and  Schmidt,  1993;  Yin,  Giger,  Doi, 
Vyborny,  and  Schmidt,  1994;  Kupinski,  2000),  and  one  for  classifying  known 
lesions  as  malignant  or  benign  (Huo,  Giger,  Vyborny,  Wolverton,  Schmidt,  and 
Doi,  1998;  Huo,  Giger,  and  Metz,  1999;  Huo,  Giger,  Vyborny,  Wolverton,  and 
Metz,  2000;  Huo,  Giger,  and  Vyborny,  2001;  Huo,  Giger,  Vyborny,  and  Metz, 
2002).  Combining  these  two  types  of  CAD  scheme  is  inherently  difficult,  be¬ 
cause  the  output  of  the  detection  scheme  will  necessarily  include  false-positive 
(FP)  computer  detections  in  addition  to  the  malignant  and  benign  lesions  to 
be  classified.  These  FP  computer  detections  correspond  to  objects  which  were 
by  design  not  included  in  the  training  sample  of  the  classification  scheme, 
because  they  are  not  members  of  the  data  population  (benign  and  malignant 
mass  breast  lesions)  for  which  the  classification  scheme  was  created.  It  is  clear 
then  that  the  detection  scheme’s  output  cannot  be  used  unmodified  as  the 
input  to  the  classification  scheme. 

Our  approach  has  been  to  treat  this  problem  explicitly  as  a  three-class  classifi¬ 
cation  task.  That  is,  the  outputs  of  the  detection  scheme  should  be  classified  as 
malignant  lesions,  benign  lesions,  and  non-lesions  (FP  computer  detections), 
and  the  classifier  to  be  estimated  is  the  ideal  observer  decision  rule  for  this 
task.  Such  an  approach  presents  considerable  difficulties  of  its  own.  On  the 
one  hand,  decision  rules,  in  particular  ideal  observer  decision  rules,  increase 
rapidly  in  complexity  with  the  number  of  classes  involved.  On  the  other  hand, 
a  fully  general  performance  evaluation  method,  such  as  a  three-class  extension 
of  receiver  operating  characteristic  (ROC)  analysis,  has  yet  to  be  developed. 

The  explicit  form  of  the  ideal  observer  in  a  three-class  classification  task  has 
been  known  for  some  time  (Van  Trees,  1968).  For  the  reasons  just  stated,  how¬ 
ever,  a  practical  method  for  estimating  and  evaluating  observer  performance 
based  on  an  ideal  observer  model  has  proven  elusive,  despite  the  success  of 
the  two-class  binormal  ideal  observer  model  (Metz  and  Pan,  1999).  Never¬ 
theless,  pragmatic  observer  decision  rule  models  for  three-class  classification 
tasks  have  been  proposed  relatively  recently  by  several  groups  of  researchers. 
In  some  cases,  these  models  are  motivated  more  by  considerations  of  tractabil- 
ity  than  of  complete  generality.  This  is  of  course  understandable  given  the 
inherent  difficulties  of  three-class  classification;  however,  we  thought  it  might 
be  of  interest  to  analyze  a  number  of  recently  proposed  three-class  decision 
rule  models  within  an  ideal  observer  decision  rule  framework. 

In  the  next  section,  we  review  the  three-class  ideal  observer  decision  rule.  In 
the  following  three  sections,  we  review  recently  proposed  three-class  decision 
rule  models:  one  by  Scurfield  (1998),  one  by  Chan,  Sahiner,  Hadjiiski,  Petrick, 
and  Zhou  (2003),  and  one  by  Mossman  (1999).  In  each  case,  the  given  decision 
rule  is  analyzed  in  terms  of  the  ideal  observer  decision  rule;  where  necessary 
or  expedient,  assumptions  are  made  about  the  observer’s  decision  variables  in 
order  to  facilitate  this  analysis.  We  emphasize  that  we  do  not  attempt  a  review 
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of  the  experimental  methods  or  proposed  performance  evaluation  metrics  in 
the  works  discussed;  we  are  specifically  interested  only  in  the  form  of  the 
decision  rule  which  serves  as  the  starting  point  for  each  work.  The  results  of 
our  analyses  are  briefly  summarized  in  See.  6. 


2  The  Three-Class  Ideal  Observer 


It  can  be  shown  (Van  Trees,  1968;  Edwards,  Metz,  and  Kupinski,  2004b) 
that  an  iV-class  ideal  observer  makes  decisions  regarding  statistically  variable 
observations  x  by  partitioning  a  likelihood  ratio  decision  variable  space,  where 
the  boundaries  of  the  partitions  are  given  by  hyperplanes: 


decide  d  =  7T;  iff 


N- 1 


-  Uj\k)P{t  —  7Tfc)LRfc  >  (Uj\M  -  Ui\N)P(t  —  TTw) 

fc=l 


and 


N- 1 


(U*\k  ~  Uj\k)P{t  =  ^fe)LRfe  >  {Uj\M  -  Ui\N)P{t  —  txn) 

k= 1 


{j<i}  (1) 


{3>i}.  (2) 


Here  Ui\j  is  the  utility  of  deciding  an  observation  is  from  class  7Tj  given  that  it 
is  actually  from  class  7 r,-,  and  the  N  —  1  likelihood  ratios  are  defined  as 


TR  =  Pggjt  =  TTfc) 

k  ~  y3{x\t  =  irN) 


(3) 


for  i  <  N.  We  also  define  the  actual  class  (the  “truth” )  to  which  an  observation 
belongs  as  t,  and  the  class  to  which  it  is  assigned  (the  “decision”)  as  d,  where  t 
and  d  can  take  on  any  of  the  values  7Ti, . . . ,  7^, . . . ,  ttn,  the  labels  of  the  various 
classes.  (We  use  boldface  type  to  denote  statistically  variable  quantities.)  For 
simplicity,  we  will  usually  write  7 to  denote  the  event  t  =  7^,  as  in  the  a 
priori  probability  P^k)- 

The  partitioning  of  the  decision  variable  space  is  determined  by  the  parameters 

'Jijk  =  ( Ui\k  Pj\k)P{j^k)i  (4) 


with  i,  j,  and  k  varying  from  1  to  N,  and  j  ^  i.  Note  that  these  parameters 
are  not  independent,  however,  because 

' fijk  Tfcjfc  Tfcife-  (5) 
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We  can  impose  the  reasonable  condition  that  the  utility  for  correctly  clas¬ 
sifying  an  observation  from  a  given  class  should  be  greater  than  any  utility 
for  incorrectly  classifying  an  observation  from  the  same  class,  i.  e.,  Ui\i  > 
Uj\i  This  gives,  for  j  ±  i, 

7 iji  >  0,  (6) 


leaving  N(N  —  1)  parameters  (the  rest  are  derivable  from  (5)). 

Finally,  note  that  the  hyperplanes  represented  by  (1)  and  (2)  are  unchanged 
if  we  multiply  all  of  these  relations  by  a  single  scalar,  such  as  l/d^y  7y»). 
This  leaves  us  with  N2  —  N  —  1  degrees  of  freedom,  as  expected. 

The  behavior  of  a  three-class  ideal  observer  is  completely  determined  by  the 
three  decision  boundary  lines 


7121LR1  —  7212LR2  =7313  —  7323  (7) 

7l3lLRl  +  (7232  —  7212  )LR2  =  7313  (8) 

(7131  —  7l2l)LRl  +  7232LR2  =7323)  (9) 

which  we  call,  respectively,  the  “l-vs.-2”  line,  the  “1-VS.-3”  line,  and  the  “2- 
vs.-3”  line.  Note  that  if  any  two  of  these  lines  intersect,  the  third  line  must 
also  share  this  intersection  point.  We  also  emphasize  the  simple  interpretation, 
from  (4),  of  each  of  the  7^  parameters  appearing  in  these  decision  boundary 
line  equations  as  the  difference  in  utilities  between  a  “correct”  and  one  partic¬ 
ular  “incorrect”  decision  (scaled  by  the  a  priori  probability  of  the  true  class  in 
question);  and  of  each  difference  in  the  7 y*  parameters  as  a  difference  in  util¬ 
ities  between  two  possible  “incorrect”  decisions  (again  scaled  by  the  a  priori 
probability  of  the  true  class  in  question). 

An  example  ideal  observer  decision  rule  for  particular  values  of  the  utilities 
Ui\j,  and  hence  of  the  parameters  7 y<,  is  shown  in  Fig.  1.  Here  we  have  chosen 
7121  =  7212  =  3/14  and  7131  =  7313  =  7232  =  7323  =  1/7,  yielding  the  decision 
boundary  lines 


TjLr1-1lr2=o 

{“l-t«.-2”} 

(10) 

iLRl-iLR2  =  i 

{“1-115.-3”} 

(ii) 

-iLRI  +  iLR2  =  i 

{“2-H5.-3”}. 

(12) 

These  simplify  to  the  equations  LR2  —  LRi,  LR2  —  2LRi  —  2,  and  LR2  = 
LRi/2  +  1,  respectively. 
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Fig.  1.  Example  three-class  ideal  observer  decision  rule,  given  the  values  of  the 
decision  parameters  7121  =  7212  =  3/14  and  7131  =  7313  =  7232  =  7323  =  1/7.  Note 
that  7 iji  =  ( Ui\i  -  Uj\i)P(t  =  7Ti). 

3  The  Scurfield  Decision  Rule 


Scurfield  investigated  a  decision  rule  applied  to  two-dimensional  statistically 
variable  data  (y  =  (y!,y2))  drawn  from  three  classes  (Scurfield,  1998).  The 
application  domain  was  human  observer  performance  modeling  for  acoustical 
psychophysics  experiments.  (In  prior  work,  Scurfield  investigated  a  decision 
rule  for  three-class  classification  of  univariate  data  (Scurfield,  1996).  We  will 
not  review  that  prior  work  here,  because  at  present  we  are  interested  in  relating 
given  observer  models  to  the  three-class  ideal  observer  model  for  multivariate 
observational  data,  which  yield  two-dimensional  decision  variable  data  by  (3).) 
In  Scurfield’s  work,  no  assumptions  are  made  about  the  decision  variables  yj 
and  y2;  in  particular,  these  decision  variables  are  not  assumed  to  be  related 
in  any  way  to  an  ideal  observer  model.  This  is  entirely  appropriate  given  the 
nature  of  the  problem  domain  Scurfield  investigated  —  i.  e.,  human  observer 
performance  modeling.  It  can  readily  be  shown,  however,  that  if  one  chooses 
to  make  such  assumptions,  special  cases  of  the  Scurfield  model  are  in  fact 
special  cases  of  an  ideal  observer  decision  rule. 

The  Scurfield  decision  rule  is  dependent  on  two  decision  parameters,  which  we 
will  call  71  and  72.  The  decision  rule  can  be  written  as 


decide 

d  =  7Ti 

iff  2/i 

-  2/2  >  7i  ~  72 

and 

2/1  >  71; 

(13) 

decide 

Si¬ 

ll 

to 

iffyi 

-  2/2  <  71  -  72 

and 

2/2  >  72; 

(14) 

decide 

CO 

£ 

II 

iff 

2/1  <  7i 

and 

2/2  <  72- 

(15) 
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Fig.  2.  Decision  rule  investigated  by  Scurfield,  for  the  decision  parameters  71  and 
72- 

This  decision  rule  is  illustrated  in  Fig.  2. 

From  these  relations,  one  can  define  the  decision  boundary  lines 


Vi  -  V2  =  Ti  -  72  {  “1-IW.-2”  } 

(16) 

5? 

II 

>— 1 

V—* 

1 

e 

Co 

CO 

(17) 

V2=  72  {“2-US.-3”}. 

(18) 

If  we  choose  yx  =  LRi(x)  and  y2  =  LR2(x)  for  some  set  of  observational  data 
x,  we  have 

— LRr  -  -LR2  =  {  “i-vs.-2”  } 

To  To  To 

— LRr  {“1-W.-3”} 

To  To 

— LR2=  ^  {“2-VS.-3”}, 

To  To 

where  To  =  Ti  +  72  +  4.  Note  the  similarity  in  form  between  these  equations 
and  (7)-(9).  If  we  require  71  and  t2  to  be  positive,  the  correspondence  is  exact, 
and  this  special  case  of  (7)-(9)  is  illustrated  in  Fig.  3.  (In  fact,  the  intersection 
of  the  ideal  observer  decision  boundary  lines  can  lie  in  any  quadrant.  However, 
given  a  set  of  decision  boundary  lines  with  slopes  as  depicted  in  Fig.  2,  the 
occurrence  of  the  intersection  point  in  any  quadrant  other  than  the  first  would 
result  in  an  ideal  observer  operating  point  for  which  no  observations  were 
assigned  to  class  7 r3.  This  “degenerate”  case  will  not  be  considered  here.) 

A  second  correspondence  between  Scurfield’s  decision  rule  and  the  ideal  ob- 


(19) 

(20) 
(21) 
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Fig.  3.  A  special  case  of  the  ideal  observer  decision  rule  with 
7121  =  7212  =  7131  =  7232  =  l/(7i  +  72  +  4),  7313  =  71/(71  +  72  +  4),  and 
7323  =  72/(71  +72  +  4).  The  parameters  71  and  72  are  positive  but  otherwise 
arbitrary;  this  decision  rule  is  a  special  case  of  the  Scurfield  decision  rule  with 
y1  =  LRi(x)  and  y2  =  LR2(x). 

server  decision  rule  can  be  obtained  by  taking  yx  =  log(LRi(x))  and  y2  = 
log(LR2(x)),  with  71  and  72  now  unrestricted.  Substituting  this  definition  in 
(16)- (18),  we  obtain 


log(LRi)  -  log(LR2)  =71  -  72 

{“1-VS.-2”} 

(22) 

log  (LRj)  =71 

{“l-us.-3”} 

(23) 

log(LR2)  =  72 

{“2-US.-3”}. 

(24) 

Taking  exponentials  on  each  side  of  these  equations  then  gives 


_„71-72 

lr2 

{“1-US.-2”} 

(25) 

LRi=e71 

{“1-US.-3”} 

(26) 

LR2  =  e72 

{“2-US.-3”}; 

(27) 

we  can  then  rearrange  terms  and  divide  the  equations  by  a  constant  factor  to 
obtain 


7o 

e-7i 


LRi 


p-12 

— lr2 = 0 

7o 

_  j_ 

7o 


{“1-US.-2”} 


{“1-US.-3”} 


(28) 


7o 


7 


(29) 


Fig.  4.  A  special  case  of  the  ideal  observer  decision  rule  with  7121  =  7131  =  e  71/7o> 
7212  =  7232  =  e~7l/7o.  7313  =  7323  =  I/70,  and  70  =  2(e~71  +  e~72  +  1).  The 
parameters  71  and  72  are  arbitrary;  this  decision  rule  is  a  special  case  of  the  Scurfield 
decision  rule  with  y:  =  log(LRi(x))  and  y2  =  log(LR2(x)). 

e-72  1 

- LR2  =  —  {“2-ws.-3”},  (30) 

7o  7o 

where  70  =  2(e-71  +  e“72  +  1).  By  inspection,  this  is  again  a  special  case  of 
(7)-(9),  which  is  illustrated  in  Fig.  4. 

Finally,  if  we  take  yt  =  P(7Ti|x)  and  y2  =  P( 7t2|x),  and  require  0  <  <  1 

and  0  <  72  <  1,  we  obtain 


P{ 7Ti|x)  - 

-P(7T2|x)=7i  -72 

{“l-t».-2”} 

(31) 

P(7Ti|f) 

—  7i 

{“1-1/S.-3”} 

(32) 

P(n2\x)=  72 

{“2-i/s.-3”}, 

(33) 

as  illustrated  in  Fig.  5. 

Note  that  (3)  can  be  written  as 


LRi  = 
P(Fi\x)  = 
P(lTi\x)  = 


P(ni\x)p(x)/P(m)  r,  .  ,  ^ 

P(x\^)  1  •  "  -  1 

p(x)/p(x\tt 3) 

_ LR,[P(7r,)/P(7r3)] _ 

1  +  LR1[P(7r1)/P(7r3)]  +  LR2[P(7r2)/P(7r3)] ' 


This  allows  us  to  rewrite  (31)-(33)  as 


(34) 
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Fig.  5.  A  special  case  of  the  Scurfield  decision  rule  with  y1  =  P(ni\x)  and 
y2  =  P(ir2\x). 


1  ~  (7i  ~  72)  P(n  1) 
7o  P(n  3) 
1  -  71  P(tti) 
7o  Pfrs) 
72  1) 

7o  P(tt3) 


LRX  - 


LRi  - 


LRX  + 


1  +  (7i-72)^(7T2)Tp  _  7i  — 
7o  P(tt3)  2  7o 
7i  _  7i 

7oP(tt3)  n2  70 
1 -72P(7r2)TT5  _  72 
7o  P(tt  3)  7o 


72 


(35) 

(36) 

(37) 


respectively,  where  70  =  (2-27i-f-72)P(7ri)/P(7r3)+(2+7i-272)P(7r2)/P(7r3)-|- 
7i  +  72-  This  is  again  a  special  case  of  (7)-(9),  as  the  quantities  1  —  (71  —  72), 
1  +  (71  —  72),  1— 71,  and  1  — 72  are  all  positive  given  0  <  71  <  1  and  0  <  72  <  1. 


Scurfield  points  out  (Scurfield,  1998)  that  the  observer  which  maximizes  Pc, 
the  “percent  correct”  or  probability  of  a  correct  response,  is  a  special  case 
of  the  ideal  observer  ( i .  e.,  a  single  operating  point  achievable  by  the  ideal 
observer  for  the  given  task).  This  observer  follows  the  Scurfield  decision  rule 
model  with  yx  =  log(LRi(x))  and  y2  =  log(LR2(x)),  and  decision  parameters 
given  by  e71  =  P(7r3)/P(7Ti)  and  e72  =  P(7r3)/P(7r2).  It  is  interesting  to  note 
that  the  Scurfield  decision  rule  model  can  in  fact  be  used  to  describe  ideal 
observer  performance  for  an  even  wider  class  of  operating  points,  as  shown  in 
this  section. 

To  evaluate  the  performance  of  an  observer  using  the  decision  rule  in  (16)- 
(18),  Scurfield  plots  a  set  of  six  surfaces  in  three-dimensional  ROC  spaces, 
giving  P(d  =  7r2 |t  =  a^))  as  a  function  of  P(d  =  7Ti  |t  =  a( 77))  and 
P(d  =  7r3jt  =  a(7r3)).  Here  a  is  one  of  the  six  possible  permutations  of  three 
symbols,  which  together  form  what  is  known  as  the  symmetric  group  on  three 
letters  (Clark,  1984).  Scurfield  gives  a  probabilistic  interpretation  for  this  eval¬ 
uation  methodology;  the  volume  under  each  surface  is  the  probability  of  a 
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particular  outcome  in  a  three- alternative  forced  choice  experiment,  and  thus 
the  six  volumes  must  sum  to  one.  One  can,  however,  consider  an  alternative 
formulation  motivated  strictly  by  economy  rather  than  elegance  of  interpre¬ 
tation.  From  this  point  of  view,  we  claim  that  only  four  such  surfaces  are 
necessary  to  completely  characterize  the  observer’s  performance.  Without  loss 
of  generality,  we  consider  plotting  each  of  P( d  =  7r2|t  =  77),  P(d  =  7r2|t  =  7r3), 
P( d  =  7r3|t  =  7Ti),  and  P(d  =  7r3|t  =  7t2)  as  functions  of  P(d  =  77  |t  =  7r2)  and 
P(d  =  7Ti|t  =  ^3).  (As  with  Scurfield’s  plots,  these  are  well  defined  because 
Scurfield’s  decision  rule  has  two  degrees  of  freedom,  namely  the  parameters 
71  and  72.) 

Now  consider  one  of  Scurfield’s  plots,  for  example  that  which  gives  P(d  = 
7t2 1 1  =  7t2)  as  a  function  of  P(d  =  7Ti |t  =  7Ti)  and  P(d  =  7r3|t  =  7r3).  Because 
these  are  conditional  probabilities,  we  have 


P(d  =  7Ti |t  =  7Ti)  =  1  -  P(d  =  7T2 |t  =  7Ti)  -  P(d  =  7T3|t  =  7Ti)  (38) 

P(d  =  7T2|t  =  7T2)  =  1  -  P(d  =  7Ti|t  =  7T2)  -  P(d  =  7T3|t  =  7T2)  (39) 

P(d  =  7r3 jt  =  7T3)  =  1  -  P(d  =  7Tijt  =  7 r3)  -  P(d  =  7r2|t  =  7r3).  (40) 

Each  of  the  conditional  probabilities  on  the  right  hand  side  of  these  equations 

can  be  written  as  functions  of  P(d  =  7Ti  |t  =  7r2)  and  P(d  =  7Ti  |t  =  7 r3)  in  our 
formulation;  thus  the  surface  given  in  this  plot  is  determined  parametrically  by 
the  set  of  four  surfaces  we  have  given.  Similar  remarks  hold  for  the  other  five 
surfaces  used  by  Scurfield.  In  general,  for  an  iV-class  classification  task  using  a 
Scurfield-type  decision  rule  with  N  —  1  degrees  of  freedom  (the  generalization 
to  N  classes  of  (16)-(18)),  one  can  show  that  a  set  of  (N  -  l)2  hypersurfaces 
with  N  —  1  degrees  of  freedom  in  iV-dimensional  ROC  spaces  is  sufficient 
to  fully  characterize  the  observer’s  performance,  rather  than  the  set  of  Nl 
hypersurfaces  used  by  Scurfield. 


4  The  Chan  Decision  Rule 


Chan  et  al.  are  investigating  three-class  classifiers  for  computer-aided  diag¬ 
nosis  (Chan  et  al.,  2003).  Their  work  is  motivated  by  reasoning  similar  in 
principle  to  that  which  we  independently  arrived  at  when  we  began  to  con¬ 
sider  this  problem.  In  particular,  they  consider  a  clinical  situation  in  which 
observations  must  be  classified  as  malignant,  benign,  or  normal.  Because  the 
goal  of  their  work  is  to  optimize  the  performance  of  a  system  to  aid  a  radiolo¬ 
gist  or  clinician,  rather  than  to  measure  the  psychophysical  performance  of  an 
existing  observer,  they  choose  to  start  explicitly  from  an  ideal  observer  model 
in  constructing  their  decision  rule. 
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In  order  to  reduce  the  complexity  of  the  ideal  observer  decision  rule  to  man¬ 
ageable  proportions,  Chan  et  al.  impose  restrictions  on  the  utilities  used  by 
their  observer.  In  their  formulation,  the  class  we  are  labelling  7Tx  is  the  be¬ 
nign  class;  7 r2,  the  normal  class;  and  the  malignant  class  is  713.  They  further 
assume  that  the  possible  values  of  any  utility  Ui\j  are  restricted  to  the  inter¬ 
val  [0, 1].  They  then  set  Ui\i  =  U2\2  =  C/3]3  =  1  (f-  e.,  correctly  identifying 
any  case  has  maximal  utility).  Furthermore,  they  require  U2\i  =  U\\2  =  1 
and  f/i|3  =  U2\3  =  0  ( i .  e.,  misidentifying  a  benign  case  as  normal,  or  vice 
versa,  has  no  significant  cost  reducing  the  utility  of  such  a  decision  from  the 
maximum,  but  misclassifying  an  actually  malignant  case  as  benign  or  normal 
has  the  minimum  possible  utility).  Finally,  C/311  and  C/312  are  assumed  to  have 
arbitrary  values  on  the  open  interval  (0, 1)  (i.  e.,  misclassifying  an  actually 
non-malignant  case  as  malignant  will  have  some  cost  reducing  the  utility  of 
such  a  decision  from  the  maximum,  but  such  a  misclassification  is  in  some 
sense  “better”  than  missing  an  actual  malignancy).  It  is  important  to  note 
that  these  assumptions  are  arguably  relevant  to  a  reasonable  model  of  a  clin¬ 
ical  situation,  and  are  thus  of  interest  beyond  their  superficial  advantage  in 
reducing  the  degrees  of  freedom  involved  in  the  observer’s  decision  rule.  We 
will,  however,  only  consider  the  latter  issue  in  the  remainder  of  this  section. 

Substituting  the  values  of  the  utilities  given  above  into  (4) ,  we  obtain  decision 
boundary  lines  of  the  form 

0LRx+  0LR2  =  0  {“1-VS.-2”}  (41) 

(l-C/3|i)P(7r1)LRi  +  (]  ->3!2)/j("2)LR2  =  PM  j }  (42) 

To  To  To 

(!  -  Um)PMLRi  +  C1  -  U^)PMLR2  =  £M  ^,.3^(43) 

To  To  To 

where  70  =  1  +  P^)  —  C/3|iP(7Ti)  —  C/3|2P(7r2).  Note  that,  as  Chan  et  al.  point 
out,  the  “l-us.-2”  line  is  in  fact  undefined  for  this  choice  of  utilities,  while  the 
“1-VS.-3”  and  “2- v.s. -3”  lines  are  identical.  This  is  a  general  consequence  of 
(7)-(9);  if  any  two  of  these  equations  yield  identical  lines,  the  third  line  must 
be  either  identical  to  them  or  undefined.  (Note  that,  strictly  speaking,  the 
utility  structure  employed  by  Chan  et  al.  is  excluded  from  our  formulation  by 
the  requirement  stated  in  (6).  However,  this  issue  —  i.  e.,  whether  the  ideal 
observer’s  performance  should  be  considered  to  include  such  limiting  cases  — 
is  largely  a  definitional,  rather  than  a  fundamental,  issue.) 

The  decision  rule  considered  by  Chan  et  al.  is  illustrated  in  Fig.  6.  It  can  be 
argued  that,  in  a  sense,  the  output  of  this  classifier  belongs  to  only  two  classes, 
malignant  and  non-malignant;  in  particular,  because  (41)  is  undefined,  this 
observer  will  never  unequivocally  decide  d  =  7Ti  (benign)  or  d  =  tt2  (normal). 
In  fact,  if  C/311  =  C/312,  the  observer’s  performance  is  identical  with  that  of  a 
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Fig.  6.  The  decision  rule  investigated  by  Chan  et  al.,  which  is  a  spe¬ 
cial  case  of  the  ideal  observer  decision  rule  with  7121  =  7212  =  0, 
7131  =  (1  -  t/3|i)P(7ri)/70  ,  7232  =  (1  -  t/3|2)-P(7T2)/7o,  and  7313  =  7323  =  P(tt3)/7o; 
here  70  =  1  +  P( 713)  —  {73|1P(7ri)  —  U^Pfa)-  Observations  in  the  unlabelled  re¬ 
gion  are  decided  “not  713”,  i.  e.,  either  “7Ti”  or  “712”.  The  intercepts  71  and  72  are 
■P(tt3)/[(1  -  U3\i)P(tti)]  and  P(7r3)/[(1  -  f/3,2)P( 7r2)],  respectively. 

two-class  ideal  observer  which  distinguishes  between  the  malignant  and  non- 
malignant  (benign  plus  normal)  classes.  However,  in  the  more  general  case  in 
which  U3\i  7^  U3 12,  the  observer  considered  by  Chan  et  al.  is  able  to  achieve 
ROC  operating  points  not  accessible  by  the  two-class  ideal  observer.  (That 
is,  the  three-class  ideal  observer  can  achieve  points  below  the  two-class  ideal 
observer’s  ROC  curve  in  a  two-class  ROC  space,  or,  equivalently,  points  off 
the  curve  representing  the  two-class  ideal  observer’s  performance  plotted  in  a 
three-class  ROC  space.)  Intuitively,  their  observer  makes  decisions  based  on 
the  three  distribution  functions  of  the  observational  data,  even  though  the 
observer’s  output  consists  of  only  two  possible  responses. 

Chan  et  al.  evaluate  the  performance  of  their  observer  by  plotting  P(d  = 
7r3|t  =  7t3)  as  a  function  of  P(d  =  7r3 |t  =  7Ti)  and  P(d  =  7T3|t  =  7t2).  Note 
that,  unlike  the  case  for  the  Scurfield  decision  rule  (and  for  the  Mossman 
decision  rule,  as  shown  in  the  next  section),  this  single  two-dimensional  surface 
is  sufficient  to  completely  characterize  the  performance  of  their  observer.  This 
is  because,  as  just  stated,  the  observer’s  output  consists  of  only  two  possible 
responses,  and  thus  we  have  only  six  classification  probabilities  P(d  =  7Tj|t  = 
7 Tj)  rather  than  the  nine  expected  in  a  three-class  classification  task.  These  six 
conditional  probabilities  are  still  constrained  by  three  equations,  however: 


P(d  =  7T3|t  =  7Ti)  +  P(d  =  7T3|t  =  7Ti)  =  1 
P(d  =  7T3|t  =  7 r2)  +  P(d  =  7T3  |t  =  7r2)  =  1 


12 


(44) 

(45) 


P(d  =  7T3|t  =  7T3)  +  P(d  =  7T3|t  =  7 r3)  =  1, 


(46) 


where  the  expression  d  =  7r3  indicates  that  the  observer  decides  that  the 
observation  does  not  belong  to  class  7t3.  These  constraint  equations  allow  us 
to  eliminate  three  of  the  six  conditional  probabilities,  leaving  a  single  ROC 
surface  with  two  degrees  of  freedom  in  a  three-dimensional  ROC  space. 


5  The  Mossman  Decision  Rule 


Mossman  investigates  a  decision  rule  applied  to  a  set  of  three  decision  variables 
y1;  y2,  and  y3,  subject  to  the  constraint 

yi  +  y2  +  y3  =  i»  (47) 

as  well  as  0  <  yj  <  1  (1  <  i  <  3}.  This  is  consistent  with  the  constraint 
on  the  a  posteriori  class  probabilities,  P(7Ti|x)  +  P(7T2|x)  +  P(7t3|x)  =  1; 
these  quantities  are  known  to  be  directly  related  to  the  likelihood  ratio  ideal 
observer  decision  variables  (Kupinski,  Edwards,  Giger,  and  Metz,  2001;  Ed¬ 
wards,  Lan,  Metz,  Giger,  and  Nishikawa,  2004a).  Mossman  does  not  explicitly 
require,  however,  that  the  decision  variables  in  (47)  be  the  a  posteriori  class 
probabilities  (e.g.,  they  may  be  noisy  estimates  of  these  quantities). 

The  decision  rule  considered  by  Mossman,  which  depends  on  two  decision 
parameters  71  and  72,  is 


decide  d  =  7Ti  iff  y2  -  yi  <  72  and  y3  <  71;  (48) 

decide  d  =  7r2  iff  t/2  —  2/1  >  72  and  y3  <  71;  (49) 

decide  d  —  7t3  iff  2/3  >  7i-  (50) 

where  0  <  7i  <  1  and  —  1  <  72  <  1-  From  these  relations,  and  given  the 
relation  y3  =  1  —  yx  —  y2  from  (47),  one  can  define  the  decision  boundary  lines 

yi-V2  =  -72  { “1-1/S.-2” }  (51) 

2/i  +  V2  =  1  -  71  {  “1-W.-3”  }  (52) 

2/i  +  2/2  =  1  -  7i  {  “2-vs.-3”  }.  (53) 


This  decision  rule  is  illustrated  in  Fig.  7.  Note  that,  similar  to  the  Chan  et  al. 
decision  rule,  the  “l-us.-3”  and  “2- vs. -3”  decision  boundary  lines  are  identical. 

We  now  consider  a  special  case  of  the  Mossman  decision  rule  in  which  yx  = 
P(7Ti|x),  y2  =  P(7T2|x),  and  y3  =  P(7r3|x)  for  some  observational  data  vector 
x.  Note  that  (3)  can  be  written  as 
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Fig.  7.  Decision  rule  investigated  by  Mossman,  for  the  decision  parameters  71  and 
72,  shown  in  the  a  posteriori  class  probability  space. 


LRi  = 


P(ni\£)  = 


P(7ti\x)  = 


P(TTi\x)p{x)/P(Tri) 

P(x)\n  3) 

LR,P(7ri)/P(7r3) 


{i  :  1  <  i  <  2} 


p(x)/p(x\n  3) 

LR,P(7ri)/P(7r3) 


1  +  LR1[P(7r1)/P(7r3)]  +  LR2[P(7r2)/P(7r3)]- 


(54) 


This  allows  us  to  rewrite  (51)-(53)  as 


(55) 

(56) 

(57) 

Although  the  Mossman  decision  rule  appears  similar  in  form  to  the  ideal  ob¬ 
server  decision  rule,  recall  from  Sec.  4  that  if  two  of  the  decision  boundary  line 
equations  are  identical,  the  third  must  yield  a  line  identical  to  the  first  two  or 
be  undefined.  Another  way  to  see  this  is  to  note  that  the  coefficients  of  (9)  are 
differences  of  the  corresponding  coefficients  of  (7)  and  (8).  If  the  coefficients 
of  (8)  and  (9)  are  identical,  it  must  be  the  case  that  the  coefficients  of  (7) 
are  all  zero.  For  the  Mossman  decision  rule,  this  would  require  1  +  72  =  0, 
1  —  72  =  0,  and  72  =  0  simultaneously,  which  is  clearly  impossible.  It  follows 


(1  +  72)^LR1-(1-72)^LR2  =  -T2 

{  “1-VS.-2”  } 

7lP(*3)LRl  + 

Pfa)  T  p  t 

7lPfe)LR2  =  1  71 

{“l-us.-3”} 

7lP(%)LR,+ 

Pfa) 

7lP73)LR2_1  71 

{“2-us.-3”}, 

This  version  of  the  decision  rule  is  illustrated  in  Fig.  8. 
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Fig.  8.  Decision  rule  investigated  by  Mossman,  for  the  decision  parameters  71  and 
72,  shown  in  likelihood  ratio  space. 

that  the  decision  rule  considered  by  Mossman  cannot  represent  possible  ideal 
observer  performance  for  any  choice  of  the  utilities  Ui\j  in  (1)  and  (2). 

Mossman  proposed  that  the  ROC  surface  obtained  by  plotting  P(d  =  7T3  |t  = 
7t3)  as  a  function  of  P(d  =  7Ti |t  =  7Ti)  and  P(d  =  7T2 |t  =  7t2)  be  used  to 
evaluate  the  performance  of  the  observer.  Although  this  surface  is  clearly  well- 
defined  (the  Mossman  decision  rule  has  two  degrees  of  freedom,  namely  the 
parameters  71  and  72),  it  follows  from  the  discussion  at  the  end  of  Sec.  3  that 
four  such  surfaces  in  three-dimensional  ROC  spaces  are  needed  to  completely 
characterize  the  observer’s  performance. 


6  Discussion  and  Conclusions 


We  examined  three  decision  rules  proposed  recently  for  three-class  classifi¬ 
cation  tasks  by  different  researchers.  The  basis  for  our  evaluation  was  ideal 
observer  decision  theory,  primarily  because  our  own  interest  in  the  three- class 
classification  task  is  its  possible  application  to  CAD.  A  major  goal  in  the 
development  of  a  computerized  scheme  for  CAD  is  the  optimization  of  the 
performance  of  that  scheme,  in  order  to  provide  the  maximum  benefit  to  clin¬ 
icians  and  thus  to  their  patients.  It  should  thus  be  kept  clearly  in  mind  that 
the  ideal  observer  framework  may  not  be  as  relevant,  for  example,  to  work 
which  is  motivated  by  purely  psychophysical  considerations  (Scurfield,  1996, 
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1998;  Mossman,  1999)  —  i.e.,  where  the  goal  is  to  estimate  of  the  properties 
of  an  existing  observer. 

That  being  said,  the  three-class  classification  task  is  difficult  enough  that  it  is 
perhaps  worth  making  any  attempt  to  analyze,  from  a  single  point  of  view,  the 
work  of  the  relatively  few  researchers  investigating  this  problem,  even  in  cases 
where  that  point  of  view  is  not  necessarily  relevant  to  the  underlying  motiva¬ 
tions  for  that  work.  We  feel  the  insights  we  have  gained  from  the  analysis  of 
various  decision  rules  presented  here  should  provide  at  least  some  justification 
for  that  claim. 

In  particular,  Scurfield  points  out  (Scurfield,  1998)  that  his  proposed  decision 
rule  is  in  fact  an  ideal  observer  decision  rule  for  a  single  ideal  observer  operat¬ 
ing  point,  namely  the  observer  which  maximizes  the  probability  of  any  correct 
response  (or  “percent  correct”  or  Pc).  We  were  able  to  show  that,  under  var¬ 
ious  assumptions,  a  larger  set  of  such  correspondences  between  the  Scurfield 
observer  and  the  ideal  observer  exists. 

Chan  et  al.  are  working  on  the  application  of  three-class  classification  to  CAD, 
and  thus  explicitly  take  the  ideal  observer  as  the  starting  point  in  the  devel¬ 
opment  of  their  decision  rule  (Chan  et  al.,  2003).  Although  this  rendered  our 
analysis  of  that  decision  rule  in  terms  of  ideal  observer  decision  theory  largely 
trivial,  their  decision  rule  merits  attention  as  an  example  of  a  situation  in 
which  the  ideal  observer  is  indeed  making  use  of  information  from  the  three 
classes  of  observations  ( i.e .,  its  behavior  is  demonstrably  different  from  that 
of  a  two-class  ideal  observer),  while  only  producing  two  different  responses  for 
those  observations.  In  two-class  classification,  the  only  corresponding  exam¬ 
ples  are  trivial:  either  the  observer  always  calls  observations  positive  (achieving 
an  operating  point  of  (FPF  =  1,TPF  =  1),  where  FPF  is  the  false-positive 
fraction  and  TPF  the  true-positive  fraction)  or  always  calls  them  negative 
(FPF  =  0,  TPF  =  0). 


Finally,  we  showed  that  a  decision  rule  proposed  by  Mossman  (Mossman,  1999) 
does  not  correspond  to  ideal  observer  behavior  for  any  possible  values  of  the 
utilities  used  by  the  ideal  observer.  However,  we  note  that  the  structure  of  the 
Mossman  decision  rule  —  a  simple  sequence  of  thresholds  on  single  decision 
variables  —  may  indeed  serve  as  a  reasonable  model  for  human  observer  per¬ 
formance  in  certain  situations,  e.  g.,  differential  diagnosis.  That  such  a  decision 
rule  fails  to  be  an  ideal  observer  decision  rule  may  be  considered  surprising, 
given  the  properties  the  Mossman  decision  rule  shares  with  that  of  Chan  et  al. 
- —  in  particular,  the  identity  of  two  out  of  the  three  decision  boundary  lines. 
The  reasons  why  one  decision  rule  can  be  said  to  always  correspond  to  ideal 
observer  behavior,  while  a  rule  similar  in  structure  never  can,  are  connected  to 
fundamental  constraints  on  the  ideal  observer’s  behavior;  given  the  inherent 
complexities  of  the  three-class  classification  task,  it  is  easy  for  such  properties 
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to  become  “lost  in  the  noise”  so  to  speak.  A  close  comparison  of  two  possi¬ 
ble  three-class  classification  decision  rules  can  thus  provide  an  immediate  and 
intuitive  understanding  of  such  properties,  even  though  a  complete  and  fully 
general  solution  to  the  three-class  classification  problem  remains  elusive. 
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Restrictions  on  the  Three-Class  Ideal 
Observer’s  Decision  Boundary  Lines 

Darrin  C.  Edwards*  and  Charles  E.  Metz 


Abstract 

We  are  attempting  to  develop  expressions  for  the  coordinates  of  points  on  the  three-class  ideal 
observer’s  receiver  operating  characteristic  (ROC)  hypersurface  as  functions  of  the  set  of  decision  criteria 
used  by  the  ideal  observer.  This  is  considerably  more  difficult  than  in  the  two-class  classification  task, 
because  the  conditional  probabilities  in  question  are  not  simply  related  to  the  cumulative  distribution 
functions  of  the  decision  variables,  and  because  the  slopes  and  intercepts  of  the  decision  boundary 
lines  are  not  independent;  given  the  locations  of  two  of  the  lines,  the  location  of  the  third  will  be 
constrained  depending  on  the  other  two.  In  the  present  work  we  attempt  to  characterize  those  constraining 
relationships  among  the  three-class  ideal  observer’s  decision  boundary  lines.  As  a  result,  we  show  that 
the  relationship  between  the  decision  criteria  and  the  misclassification  probabilities  is  not  one-to-one, 
as  it  is  for  the  two-class  ideal  observer. 
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Restrictions  on  the  Three-Class  Ideal 
Observer’s  Decision  Boundary  Lines 

I.  Introduction 

Receiver  operating  characteristic  (ROC)  analysis  is  the  accepted  methodology  for  analyzing 
the  performance  of  a  two-class  classifier  [1],  in  particular  for  medical  decision-making  tasks 
in  which  a  patient  is  diagnosed  as  having  a  particular  condition  or  not  based  on  features  of  a 
medical  image  [2].  In  judging  the  performance  of  an  observer  measured  via  ROC  analysis,  the 
standard  for  comparison  is  the  so-called  ideal  observer,  that  observer  which  outperforms  any  other 
possible  observer  given  the  statistical  variability  of  the  observational  data  being  classified  [1], 
[3].  Although  the  general  form  of  the  ideal  observer  in  a  classification  task  with  three  or  more 
classes  has  been  known  for  some  time  [3],  the  considerable  complexities  inherent  to  this  model 
compared  to  the  two-class  classification  task  have  hampered  the  development  of  extensions  of 
ROC  analysis  which  are  both  fully  general  and  practically  useful.  (Several  researchers  have 
recently  proposed  restricted  observer  models  or  restricted  evaluation  methods  [4]— [7].) 

Despite  these  difficulties,  research  continues  in  this  area  because  the  advantages  to  be  gained 
from  a  three-class  classifier  and  appropriate  evaluation  methodology  are  considerable.  In  our  own 
case,  we  seek  to  combine  existing  computer-aided  diagnosis  (CAD)  schemes  for  detecting  [8]— 
[12]  mammographic  mass  lesions  and  classifying  them  as  malignant  or  benign  [13]— [17].  The 
combined  scheme  would  serve  as  a  fully  automated  classifier  (the  existing  classifier  requires 
initial  manual  identification  of  lesions  by  a  radiologist),  potentially  allowing  radiologists  to  reduce 
their  false-positive  biopsy  rate  without  reducing  their  sensitivity  for  detection  of  malignancies. 

Our  initial  efforts  toward  this  goal  so  far  have  been  more  theoretical  than  practical.  We  have 
shown  that,  just  as  the  two-class  ideal  observer  achieves  the  optimal  two-class  ROC  curve  for 
a  given  task,  the  iV-class  ideal  observer  achieves  the  optimal  iV-class  ROC  hypersurface  [18]. 
(Note  that  the  ideal  observer  is  formally  defined  as  that  which  minimizes  the  expected  Bayes 
risk  [3],  and  not  in  terms  of  classification  performance,  making  this  a  non- trivial  observation 
in  both  cases.)  More  soberingly,  we  found  recently  that  an  obvious  generalization  of  the  well 
known  performance  metric,  the  area  under  the  ROC  curve  (AUC),  is  not  a  useful  performance 
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metric  in  a  classification  task  with  three  or  more  classes  [19]. 

At  present  we  are  attempting  to  develop  expressions  for  the  coordinates  of  points  on  the 
three-class  ideal  observer’s  ROC  hypersurface  (the  conditional  probabilities  for  misclassifying 
observations  [18],  [20],  [21])  as  functions  of  the  set  of  decision  criteria  used  by  the  ideal  observer. 
This  is  considerably  more  difficult  than  in  the  two-class  classification  task  for  two  reasons.  First, 
the  conditional  probabilities  in  question  are  not  simply  related  to  the  cumulative  distribution 
functions  (cdfs)  of  the  decision  variables,  but  are  integrals  of  those  variables  over  domains 
determined  by  three  decision  boundary  lines  [3].  Second,  the  slopes  and  intercepts  of  the  decision 
boundary  lines  are  not  independent;  given  the  locations  of  two  of  the  lines,  we  have  found  recently 
that  the  location  of  the  third  will  be  constrained  depending  on  the  other  two. 

In  the  present  work  we  attempt  to  characterize  the  constraining  relationships  just  mentioned 
among  the  three-class  ideal  observer’s  decision  boundary  lines.  Although  this  work  is  admittedly 
still  removed  from  image  analysis  per  se,  we  hope  it  may  prove  of  interest  to  the  CAD  community 
and  ultimately  of  relevance  to  a  wide  variety  of  medical  image  analysis  tasks.  In  the  next  section 
we  briefly  review  the  structure  of  the  three-class  ideal  observer  and  the  notation  we  have  been 
using  to  characterize  it  [18].  In  Sec.  Ill,  we  briefly  illustrate  that  the  intersection  of  the  three-class 
ideal  observer’s  decision  boundary  lines  may  lie  in  any  of  the  four  quadrants  of  the  decision 
variable  plane.  In  Sec.  IV,  we  show  that  for  a  given  location  (slope  and  y-intercept)  of  the 
decision  boundary  line  separating  the  first  and  third  classes,  the  location  of  one  of  the  remaining 
two  lines  is  constrained  in  a  particular  way  based  on  the  location  of  the  other. 

Given  the  arbitrariness  of  the  labels  applied  to  the  three  classes  (/.  e. ,  which  classes  are 
considered  first,  second,  or  third),  one  would  expect  the  selection  of  the  fixed  line  in  Sec.  IV 
to  be  similarly  arbitrary,  and  indeed  in  Secs.  V  and  VI  we  show  that  corresponding  results  are 
obtained  if  one  takes  the  location  of  the  decision  boundary  line  separating  the  second  and  third, 
or  first  and  second,  classes,  respectively,  to  be  given.  These  results  are  summarized,  and  the 
principal  conclusions  of  the  work  are  given,  in  Sec.  VII. 

II.  The  Three-Class  Ideal  Observer 

In  [18],  we  showed  that  an  iV-class  ideal  observer  makes  decisions  by  partitioning  a  likelihood 
ratio  decision  variable  space,  where  the  boundaries  of  the  partitions  are  given  by  hyperplanes: 

N-l 

decide  d  =  ^  iff  ^  {Ui\k  -  Uj\k)P{t  =  7rfe)LRfe 

k= 1 
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(1) 


>  (Uj\N  —  Ui\N)P(t  =  7 rjv)  {j  <  i} 

N-l 

and  ^  ]  {Ui\k  Uj\k)P(t  —  7Tfc)LR,^ 
fc=i 

>  (Uj\N  —  Ui\N)P(t  =  ttn)  {j  >  i}-  (2) 

(Here  Ui\j  is  the  utility  of  deciding  an  observation  is  from  class  77  given  that  it  is  actually  from 
class  7 Tj.)  The  partitioning  is  determined  by  the  parameters 

' )ijk  —  (Ui\k  Uj\k)P(ti  —  7 Tfe),  (3) 

with  i,  j,  and  k  varying  from  1  to  N,  and  j  ^  i.  Note  that  these  parameters  are  not  independent, 
however,  because 

'Yijk  Ikjk  Tfeifc*  (4) 

We  can  impose  the  reasonable  condition  that  the  utility  for  correctly  classifying  an  observation 
from  a  given  class  should  be  greater  than  any  utility  for  incorrectly  classifying  an  observation 
from  the  same  class,  i.  e.,  >  Uj\i  {i  ^  j}.  This  gives,  for  j  ^  i, 

1 iji  '>  0,  (5) 

leaving  N(N  -  1)  parameters  (the  rest  are  derivable  from  (4)). 

Finally,  note  that  the  hyperplanes  represented  by  (1)  and  (2)  are  unchanged  if  we  multiply  all 
of  these  equations  by  a  single  scalar,  such  as  This  leaves  us  with  N2  -  N  -  1 

degrees  of  freedom,  as  expected. 

The  behavior  of  a  three-class  ideal  observer  is  completely  determined  by  the  three  decision 
boundary  lines 


7121LR1  — 

7212LR2  =  7313  —  7323 

(6) 

7131LR1  +  (7232  - 

-  7212)LR2  =  7313 

(7) 

(7131  —  7i2i)LRi  + 

7232 LR2  =  7323, 

(8) 

which  we  call,  respectively,  the  “l-v^-2”  line,  the  “l-vs. -3”  line,  and  the  “2- vs\ -3”  line.  Note 
that  if  any  two  of  these  lines  intersect,  the  third  line  must  also  share  this  intersection  point.  We 
also  emphasize  the  simple  interpretation,  from  (3),  of  each  of  the  7^  parameters  appearing  in 
these  decision  boundary  line  equations  as  the  difference  in  utilities  between  a  “correct”  and  one 
particular  “incorrect”  decision  (scaled  by  the  a  priori  probability  of  the  true  class  in  question); 
and  of  each  difference  in  the  7^  parameters  as  a  difference  in  utilities  between  two  possible 
“incorrect”  decisions  (again  scaled  by  the  a  priori  probability  of  the  true  class  in  question). 
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Fig.  1.  Example  ideal  observer  decision  rule  in  which  the  intersection  point  of  the  decision  boundaries  lies  in  the  second 
quadrant.  Here  the  “l-vs.-2”  line  has  unit  slope,  the  “l-v.?.-3”  line  has  slope  —1,  and  the  “2- vs. -3”  line  is  horizontal. 


III.  The  Intersection  Point 

The  intersection  point  of  the  three  decision  boundary  lines  mentioned  at  the  end  of  the 
preceding  section  is  often  shown  as  occurring  in  the  first  quadrant  (LR4  >  0,  LR2  >  0).  This 
is  not,  however,  a  requirement,  and  we  can  demonstrate  with  special  cases  that  this  intersection 
point  can  in  fact  occur  in  any  quadrant.  That  is,  we  seek  values  of  the  7^  coefficients  in  (6)-(8), 
consistent  with  the  conditions  in  (5),  for  which  the  intersection  of  the  three  lines  occurs  in  the 
appropriate  quadrant. 

Consider  a  case  in  which  7^1  =  1/10,  7212  =  1/10,  7131  =  1/10,  7313  =  1/10,  7232  =  2/10, 
and  7323  =  4/10.  This  yields  the  decision  boundaries 


Tolr'  ~  Tolr> 

3 

10 

{“1-VS.-2”} 

(9) 

1 

10 

{“1-VS.-3”} 

(10) 

> 

4 

10 

{“2-vs.-3”} 

(ID 

which  intersect  at  the  point  (LRi  =  -1,LR2  =  2),  as  illustrated  in  Fig.  1. 

An  example  of  the  intersection  point  occurring  in  the  third  quadrant  may  be  obtained  by  taking 
7121  =  3/10,  7212  =  3/10,  7i3i  =  1/10,  7313  =  1/10,  7232  =  1/10,  and  7323  =  1/10.  This  yields 
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Fig.  2.  Example  ideal  observer  decision  rule  in  which  the  intersection  point  of  the  decision  boundaries  lies  in  the  third  quadrant. 
Here  the  “1-W.-2”  line  has  unit  slope,  the  “l-vs.-3”  line  has  a  slope  of  1/2,  and  the  “2-vs.-3”  line  has  a  slope  of  2. 


the  decision  boundaries 
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which  intersect  at  the  point  (LRX  =  —  1,LR,2  =  —1),  as  illustrated  in  Fig.  2. 

Finally,  an  example  of  the  intersection  point  occurring  in  the  fourth  quadrant  may  be  obtained 
by  taking  7121  =  1/10,  7212  =  1/10,  7131  =  2/10,  7313  =  4/10,  7232  =  1/10,  and  7323  =  1/10. 
This  yields  the  decision  boundaries 
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which  intersect  at  the  point  (LRi  =  2,  LR2  =  —1),  as  illustrated  in  Fig.  3. 
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Fig.  3.  Example  ideal  observer  decision  rule  in  which  the  intersection  point  of  the  decision  boundaries  lies  in  the  fourth 
quadrant.  Here  the  “l-vs.-2”  line  has  unit  slope,  the  “l-vs.-3”  line  is  vertical,  and  the  “2- vs. -3”  line  has  a  slope  of  -1. 


IV.  Restrictions  Determined  by  the  “1-vs.-3”  Line 

From  the  conditions  on  the  7^  parameters,  we  can  readily  derive  conditions  on  the  decision 
boundaries  themselves.  If  we  denote  the  slope  of  the  “ i-vs.-j ”  line  by  my,  its  y-intercept  by  by, 
and  its  2- intercept  by  %y,  we  have 


m12  = 

7121  >  0 

7212 

(18) 

Xl3  = 

7313  >  „ 

7131 

(19) 

&23  — 

7323  >  0. 

7232 

(20) 

These  are  the  three  conditions  stated  in  [22]. 

Further  constraints  on  the  decision  boundaries  can  be  obtained  by  considering  the  two  cases 
7232  -  7212  >  0  and  7232  -  7212  <  0.  In  the  first  case  (1.  e.,  7232  >  7212,  or  U\\2  >  U3\2),  we  have 
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(22) 

7232  —  7212 

6 


We  also  have 


-(7131  -  7121 ) 


m23  = 

7232 
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7232  /  7232 

This  is  a  weighted  sum  of  the  slopes  mi2  and  rai3,  where  the  weights  are  positive  and  sum  to 
one.  Since  we  must  have  mi3  <  mi2  from  (18)  and  (21),  it  must  therefore  be  the  case  that 

mi3  <  m23  <  m12.  (24) 

Furthermore, 

,  7323 

023  —  - 

7232 

7313  ~  (7313  ~  7323) 

7232 

_  CM2  ~  7212)^13  +  7212^12 
7232 

— )  613  +  — &12.  (25) 

7232  /  7232 

This  is  a  weighted  sum  of  the  y-intercepts  bi2  and  613,  where  the  weights  are  positive  and  sum 
to  one;  thus,  in  addition  to  (24),  we  have  the  condition 

min(612,  bn)  <  b23  <  max(6i2,  &i3).  (26) 

If  b\2  <  0,  then  (26)  immediately  reduces  to  6i2  <  623  <  b\3  (by  (22),  we  are  considering  a 
special  case  in  which  613  >  0).  This  is  illustrated  in  Fig.  4  for  the  slightly  different  situations 
X12  <  Xi3  and  X12  >  Xi3-  If*  on  the  other  hand,  bi2  >  0,  then  (24)  and  (26)  together  imply 
two  possible  situations,  depending  on  whether  b12  <  613  or  b\2  >  b\3.  These  possibilities  are 
illustrated  in  Fig.  5. 

We  now  consider  the  case  7232  —  72i2  <  0  (i.  e.,  7232  <  72i2,  or  U\\2  <  U3 12),  which  yields 


mi3  = 

~lm  >  0 

7232  —  7212 

(27) 

bi3  = 

7313  <  0. 

(28) 

7232  —  7212 


(a)  (b) 

Fig.  4.  Example  ideal  observer  decision  rules  for  the  case  7232  —  7212  >  0  and  612  <  0.  In  (a),  the  “ 2-vs.-3 ”  line  can  lie 
anywhere  between  the  two  dashed  lines  shown  (the  region  between  the  lower  dashed  and  dotted  lines  is  excluded  because 
623  >  0);  observations  in  the  unlabeled  region  above  this  line  will  be  decided  ‘V2”,  and  those  below  this  line  will  be  decided 
“7T3”.  In  (b),  the  “2-vs.-3”  line  can  lie  anywhere  in  the  unlabeled  region  (provided  it  shares  the  intersection  point  of  the  “l-vs.-2” 
and  “l-vs.-3”  lines  shown);  observations  above  this  line  will  be  decided  “772  ”,  and  those  below  this  line  will  be  decided  “773”. 


(a)  (b) 

Fig.  5.  Example  ideal  observer  decision  rules  for  the  case  7232  —  7212  >  0  and  612  >  0.  In  (a),  the  “2- vs. -3”  line  can  lie 
anywhere  in  the  unlabeled  region;  observations  above  this  line  will  be  decided  “7r 2 ”,  and  those  below  this  line  will  be  decided 
“773”.  In  (b),  the  “2-vs.-3”  line  can  lie  anywhere  between  the  “l-vs.-2”  and  “l-vs.-3”  lines  (provided  it  shares  their  intersection 
point);  note  that  observations  in  this  region  will  be  decided  “7ri”  regardless  of  the  position  of  this  line. 
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We  now  have 
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-  mis  4- - m2 3- 

7212  /  7212 


This  is  again  a  weighted  sum  in  which  the  weights  are  positive  and  sum  to  one,  giving 


(29) 


min(mi3,m23)  <  m12  <  max(rai3,m23). 


(30) 


Furthermore, 


7313  ~  7323 
“7212 
7313  +  7323 
7212 

(7232  ~  7212)&13  +  7232^23 
7212 


&13  + 


7232 

7212 


^23- 


(31) 


This  is  a  weighted  sum  of  the  y-intercepts  613  and  623,  where  the  weights  are  positive  and  sum 


to  one;  thus,  in  addition  to  (30),  we  have  the  condition 


&13  <  bi2  <  b23, 


(32) 


since  613  <  623  by  (20)  and  (28). 

If  m2 3  <  0,  then  (30)  immediately  reduces  to  m2 3  <  mi2  <  mi3  (by  (27),  we  are  considering 
a  special  case  in  which  m13  >  0).  This  is  illustrated  in  Fig.  6  for  the  slightly  different  situations 
X13  <  X23  and  X13  >  X23-  If>  on  the  other  hand,  m23  >  0,  then  (30)  and  (32)  together  imply 
two  possible  situations,  depending  on  whether  m2 3  <  77113  or  m2 3  >  77113.  These  possibilities  are 
illustrated  in  Fig.  7. 

One  may  of  course  ask  what  happens  when  7232  -  7212  =  0  O',  e.,  7232  =  7212,  or  U\\2  =  U3\2). 
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(a)  (b) 


Fig.  6.  Example  ideal  observer  decision  rules  for  the  case  7232  —  7212  <  0  and  m23  <  0.  In  (a),  the  “l-vs.-2”  line  can 
lie  anywhere  between  the  two  dashed  lines  shown  (the  region  between  the  lower  dashed  and  dotted  lines  is  excluded  because 
m  12  >  0);  observations  in  the  unlabeled  region  above  this  line  will  be  decided  “7r2”,  and  those  below  this  line  will  be  decided 
“7ri”.  In  (b),  the  “l-vs.-2”  line  can  lie  anywhere  in  the  unlabeled  region  (provided  it  shares  the  intersection  point  of  the  “l-vs.-3” 
and  “2-vs.-3”  lines  shown);  observations  above  this  line  will  be  decided  “772”,  and  those  below  this  line  will  be  decided  “7n”. 


Fig.  7.  Example  ideal  observer  decision  rules  for  the  case  7232  —7212  <  0  and  77123  >  0.  In  (a),  the  “1-VS.-2”  line  can  lie 
anywhere  in  the  unlabeled  region;  observations  above  this  line  will  be  decided  “772”,  and  those  below  this  line  will  be  decided 
“tti”.  In  (b),  the  “l-vs.-2”  line  can  lie  anywhere  between  the  “l-vs.-3”  and  “2-vs.-3”  lines  (provided  it  shares  their  intersection 
point);  note  that  observations  in  this  region  will  be  decided  “773”  regardless  of  the  position  of  this  line. 
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In  this  case,  both  m13 


— (7131  —  7121 

m23  =  - 


7232 
~7l31  7121 


7232 

<  mi2, 


7232  7212 

•7131  . 

1-  TYI12 


(33) 


and 


612  — 


< 


7323  ~  7313 
7212 

7323  7313 

7232  7212 

u  ,  “7313 

023  H - 

7212 

(>23- 


(34) 


Together,  (33)  and  (34)  can  be  considered  either  a  special  case  of  the  inequalities  (24)  and  (26), 
if  we  take  m  13  =  -00  and  bx3  =  +00;  or  of  the  inequalities  (30)  and  (32),  if  we  take  mx3  =  +00 
and  bx3-  - 00.  This  situation,  for  the  slightly  different  cases  612  <  0  and  bl2  >  0,  is  illustrated 
in  Fig.  8. 


V.  Restrictions  Determined  by  the  “2-vs.-3”  Line 

In  the  preceding  section,  the  possible  values  of  the  quantity  7232  -  7212  were  considered  in 
order  to  determine  properties  of  the  ideal  observer  decision  boundary  lines.  It  may  be  argued  that 
the  choice  of  a  parameter  from  the  “1-V5.-3”  line,  i.  e.,  one  of  the  three  available  lines,  must  be  an 
arbitrary  one.  In  fact,  we  may  consider  taking  another  parameter  (or  combination  of  parameters) 
from  (6)-(8),  and  using  it  to  determine  conditions  on  the  properties  of  the  decision  boundary 
lines  as  above.  Given  that  all  possible  values  of  the  quantity  7232  -  7212  were  considered,  it  is 
expected  that  no  new  conditions  should  be  determinable  (let  alone  conditions  inconsistent  with 
those  already  determined). 

Consider  the  quantity  7131  -7121  from  (8).  In  particular,  when  7131  -7121  >  0  (/.  e.,  7131  >  7^1, 
or  U2\i  >  U3\i),  we  have 

—  =  ~7232  <  0  (35) 

^23  7131  —  7121 

X23  =  — — - >0.  (36) 

7131  -  7121 
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(a)  (b) 

Fig.  8.  Example  ideal  observer  decision  rules  for  the  case  7232  —7212  =  0.  In  (a),  the  “2-vs.-3”  line  can  lie  anywhere  between 
the  two  dashed  lines  shown  (the  region  between  the  lower  dashed  and  dotted  lines  is  excluded  because  623  >  0);  observations  in 
the  unlabeled  region  above  this  line  will  be  decided  “tt2”,  and  those  below  this  line  will  be  decided  “^3”.  In  (b),  the  “2- vs. -3” 
line  can  lie  anywhere  in  the  unlabeled  region;  observations  above  this  line  will  be  decided  ‘W,  and  those  below  this  line  will 
be  decided  “tts”. 


Through  reasoning  similar  to  that  of  the  preceding  section,  we  also  have 


1 

m23 


m  13 


< 


1 

mu 


and 


(37) 


min(xi2,  X23)  <  X13  <  max(xi2,  X23)-  (38) 

If  X12  <  0,  then  (38)  immediately  reduces  to  X12  <  X13  <  X23  (by  (36),  we  are  considering  a 
special  case  in  which  X23  >  0).  This  is  illustrated  in  Fig.  9  for  the  slightly  different  situations 
b\2  <  623  and  bx2  >  b2 3.  If,  on  the  other  hand,  X12  >  0,  then  (37)  and  (38)  together  imply 
two  possible  situations,  depending  on  whether  X12  <  X23  or  X12  >  X23-  These  possibilities  are 
illustrated  in  Fig.  10. 


If  7i3i  —  7121  <  0  (/.  e.,  7131  <  7121,  or  U2\\  <  Uz\\),  we  have 


1 

m23 

X23 


7232 

7131  -  7121 
7323 

7131  —  7121 


>  0 
<  0. 


(39) 

(40) 


12 


LRi  LRi 

(a)  (b) 


Fig.  9.  Example  ideal  observer  decision  rules  for  the  case  7131  —  7121  >  0  and  X12  <  0.  In  (a),  the  “1-W.-3”  line  can 
lie  anywhere  between  the  two  dashed  lines  shown  (the  region  between  the  left  dashed  and  dotted  lines  is  excluded  because 
X13  >  0);  observations  in  the  unlabeled  region  to  the  right  of  this  line  will  be  decided  “717”,  and  those  to  the  left  of  this  line 
will  be  decided  “7T3”.  In  (b),  the  “1-VJ.-3”  line  can  lie  anywhere  in  the  unlabeled  region  (provided  it  shares  the  intersection 
point  of  the  “l-vs.-2”  and  “2-VS.-3”  lines  shown);  observations  to  the  right  of  this  line  will  be  decided  “7n”,  and  those  to  the 
left  of  this  line  will  be  decided  “n 3 


LRi  LRi 

(a)  (b) 


Fig.  10.  Example  ideal  observer  decision  rules  for  the  case  7131  —  7121  >  0  and  X12  >  0.  In  (a),  the  “1-VS.-3”  line  can  lie 
anywhere  in  the  unlabeled  region;  observations  to  the  left  of  this  line  will  be  decided  “7ri”,  and  those  to  the  right  of  this  line 
will  be  decided  “7r3”.  In  (b),  the  “l-vs.-3”  line  can  lie  anywhere  between  the  “l-vs.-2”  and  “2-vs.-3”  lines  (provided  it  shares 
their  intersection  point);  note  that  observations  in  this  region  will  be  decided  “tt2”  regardless  of  the  position  of  this  line. 
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(a)  (b) 

Fig.  11.  Example  ideal  observer  decision  rules  for  the  case  7131  — >121  <  0  and  l/mi3  <  0.  In  (a),  the  “l-vi.-2”  line  can 
lie  anywhere  between  the  two  dashed  lines  shown  (the  region  between  the  vertical  dashed  and  dotted  lines  is  excluded  because 
mi2  >  0,  and  therefore  \jm\i  >  0);  observations  in  the  unlabeled  region  above  this  line  will  be  decided  “772”,  and  those 
below  this  line  will  be  decided  “tti”.  In  (b),  the  “1-V.S.-2”  line  can  lie  anywhere  in  the  unlabeled  region  (provided  it  shares 
the  intersection  point  of  the  “1-VS.-3”  and  “2-VS.-3”  lines  shown);  observations  above  this  line  will  be  decided  “7r2”,  and  those 
below  this  line  will  be  decided  “7Ti”. 


One  can  also  show 


and 


mm 


1  1 


m  13  m23/  rn i2 


<  — —  <  max 


1  1 


777i3  777  23 


X23  <  Xl2  <  Xl3- 


(41) 

(42) 


If  l/mi3  <  0,  then  (41)  immediately  reduces  to  1/77213  <  I/77712  <  1/77223  (by  (39),  we 
are  considering  a  special  case  in  which  1/777,23  >  0)-  This  is  illustrated  in  Fig.  11  for  the 
slightly  different  situations  623  <  &13  and  623  >  &13.  If,  on  the  other  hand,  l/mi3  >  0,  then 
(41)  and  (42)  together  imply  two  possible  situations,  depending  on  whether  1/77213  <  1/^23  or 
1/77213  >  I/77223.  These  possibilities  are  illustrated  in  Fig.  12. 

Finally,  we  consider  the  case  7131  —  7121  =  0  (7131  =  7121,  or  U2\i  =  C/3|i),  in  which  both 
1  / 772-23  and  X23  are  infinite.  We  now  have 

—  <  —  (43) 

77213  77212 
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(a)  (b) 

Fig.  12.  Example  ideal  observer  decision  rules  for  the  case  7131  -  7121  <  0  and  l/rni3  >  0.  In  (a),  the  “l-vs.-2”  line  can  lie 
anywhere  in  the  unlabeled  region;  observations  above  this  line  will  be  decided  “7^”,  and  those  below  this  line  will  be  decided 
‘Vi”.  In  (b),  the  “l-vs.-2”  line  can  lie  anywhere  between  the  “l-vs.-3”  and  “2-vs.-3”  lines  (provided  it  shares  their  intersection 
point);  note  that  observations  in  this  region  will  be  decided  “^3”  regardless  of  the  position  of  this  line. 


and 

X12  <  Xis-  (44) 

Together,  (43)  and  (44)  can  be  considered  either  a  special  case  of  the  inequalities  (37)  and 
(38),  if  we  take  l/m23  =  —00  and  X23  —  +00;  or  of  the  inequalities  (41)  and  (42),  if  we 
take  l/ra23  =  +00  and  X23  =  —00.  This  situation,  for  the  slightly  different  cases  X12  <  0  and 
X12  >  0,  is  illustrated  in  Fig.  13. 

Notice  that  every  figure  in  this  section  has  one  or  more  corresponding  figures  in  the  preceding 
section  (depending  on  the  possible  values  of  the  undetermined  decision  boundary  parameter 
being  illustrated  in  that  figure).  Specifically, 
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(a)  (b) 


Fig.  13.  Example  ideal  observer  decision  rules  for  the  case  7131  -  7121  =  0.  In  (a),  the  “l-vs.-3”  line  can  lie  anywhere  between 
the  two  dashed  lines  shown  (the  region  between  the  leftmost  dashed  and  dotted  lines  is  excluded  because  X13  >  0);  observations 
in  the  unlabeled  region  to  the  right  of  this  line  will  be  decided  “tti”,  and  those  to  the  left  of  this  line  will  be  decided  “7T3”.  In 
(b),  the  “1-V.S.-3”  line  can  lie  anywhere  in  the  unlabeled  region;  observations  to  the  right  of  this  line  will  be  decided  “m”,  and 
those  to  the  left  of  this  line  will  be  decided  ‘V3”. 

Fig.  9(a)  =►  Figs.  5(a),  6(a),  8(b) 

Fig.  9(b)  =»  Fig.  5(b) 

Fig.  10(a)  =►  Figs.  4(a),  6(a),  8(a) 

Fig.  10(b)  Figs.  4(b),  6(b),  8(a) 

Fig.  11(a)  =>  Figs.  4(a),  5(a) 

Fig.  11(b)  =*  Fig.  5(b) 

Fig.  12(a)  =►  Figs.  7(a),  8(a),  8(b) 

Fig.  12(b)  =»  Fig.  7(b) 

Fig.  13(a)  =►  Figs.  5(a),  7(a),  8(b),  5(b) 

Fig.  13(b)  =*  Figs.  4(a),  7(a),  8(a) 

That  is,  none  of  the  conditions  derived  in  this  section  are  inconsistent  with  those  derived  in 
the  preceding  section.  Also  note  the  symmetry  between  the  corresponding  equations  and  figures 
in  Secs.  IV  and  V,  if  one  “swaps”  the  labels  of  classes  tti  and  7r2,  and  additionally  replaces  my 
with  1/rriij  and  Xij  with  6^.  (Intuitively,  if  one  “flips”  the  figures  in  one  section  about  the  y  —  x 
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line,  one  obtains  the  figures  in  the  other  section.) 


VI.  Restrictions  Determined  by  the  “1-vs.-2”  Line 


In  this  section,  we  consider  the  possible  values  of  the  quantity  7313  —  7323.  As  in  the  preceding 
section,  we  expect  to  obtain  no  conditions  inconsistent  with  those  already  derived. 

When  7313  -  7323  >  0  (1.  e.,  7313  >  7323,  or  U2 13  >  U^z),  we  have 


1 

—7212 

<  0 

612 

7313  —  7323 

1 

7121 

>  0. 

X12 

7313  —  7323 

(45) 

(46) 

(47) 


Through  reasoning  similar  to  that  of  the  preceding  sections,  we  also  have 


612 


bl3  &23 


(48) 


and 


.  1  1\1  (  1  1  N 

min  — ,  —  <  —  <  max  — ,  — 

VX23  X12)  Xn  VX23  X12, 


(49) 


If  I/X23  <  0,  then  (49)  immediately  reduces  to  I/X23  <  I/X13  <  I/X12  (by  (46),  we  are 
considering  a  special  case  in  which  I/X12  >  0).  This  is  illustrated  in  Fig.  14  for  the  slightly 
different  situations  m23  <  m12  and  77123  >  mi2.  If,  on  the  other  hand,  1  / X23  >  0,  then  (48) 
and  (49)  together  imply  two  possible  situations,  depending  on  whether  I/X23  <  I/X12  or  I/X23  > 
1/Xi2-  These  possibilities  are  illustrated  in  Fig.  15. 


If  7313  -  7323  <  0  (/.  e.,  7313  <  7323,  or  02|3  <  Oi|3),  we  have 


1 

“7212 

>  0 

&12 

7313  —  7323 

1 

7121 

<  0. 

Xl2 

7313  —  7323 

(50) 

(51) 

(52) 


One  can  also  show 


and 


min 


<  —  <  max 
023 


Xl2  X23 


Xn 


(53) 

(54) 
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(a)  (b) 


Fig.  14.  Example  ideal  observer  decision  rules  for  the  case  7313  -  7323  >  0  and  I/X23  <  0.  In  (a),  the  “l-vs.-3”  line  can  lie 
anywhere  between  the  two  dashed  lines  shown  (the  region  between  the  horizontal  dashed  and  dotted  lines  is  excluded  because 
X13  >  0,  and  therefore  l/xi3  >  0);  observations  in  the  unlabeled  region  to  the  left  of  this  line  will  be  decided  “7r 3”,  and  those 
to  the  right  of  line  will  be  decided  “77  1”.  In  (b),  the  “l-vs.-3”  line  can  lie  anywhere  in  the  unlabeled  region  (provided  it  shares 
the  intersection  point  of  the  “1-VS.-2”  and  “2- vs. -3”  lines  shown);  observations  to  the  left  of  this  line  will  be  decided  “773”,  and 
those  to  the  right  of  this  line  will  be  decided  “tti”. 


LRX  LRi 

(a)  (b) 


Fig.  15.  Example  ideal  observer  decision  rules  for  the  case  7313  —  7323  >  0  and  I/X23  >  0.  In  (a),  the  “l-vs.-3”  line  can  lie 
anywhere  in  the  unlabeled  region;  observations  to  the  left  of  this  line  will  be  decided  “773”,  and  those  to  the  right  of  this  line 
will  be  decided  “7n”.  In  (b),  the  “l-vs.-3”  line  can  lie  anywhere  between  the  “l-vs,-2”  and  “2- vs. -3”  lines  (provided  it  shares 
their  intersection  point);  note  that  observations  in  this  region  will  be  decided  “772”  regardless  of  the  position  of  this  line. 
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(a)  (b) 

Fig.  16.  Example  ideal  observer  decision  rules  for  the  case  7313  -  7323  <  0  and  1/613  <  0.  In  (a),  the  “2- vs. -3”  line  can 
lie  anywhere  between  the  two  dashed  lines  shown  (the  region  between  the  vertical  dashed  and  dotted  lines  is  excluded  because 
623  >  0,  and  therefore  I/623  >  0);  observations  in  the  unlabeled  region  above  this  line  will  be  decided  “772”,  and  those 
below  this  line  will  be  decided  “77 3 ”.  In  (b),  the  “2-vs.-3”  line  can  lie  anywhere  in  the  unlabeled  region  (provided  it  shares 
the  intersection  point  of  the  “1-VS.-2”  and  “1-VS.-3”  lines  shown);  observations  above  this  line  will  be  decided  “772”,  and  those 
below  this  line  will  be  decided  “773”. 


If  I/613  <  0,  then  (53)  immediately  reduces  to  1/6x3  <  I/623  <  I/612  (by  (50),  we  are 
considering  a  special  case  in  which  1/6i2  >  0).  This  is  illustrated  in  Fig.  16  for  the  slightly 
different  situations  <  m13  and  mi2  >  mi3.  If,  on  the  other  hand,  I/613  >  0,  then  (53) 
and  (54)  together  imply  two  possible  situations,  depending  on  whether  I/613  <  I/612  or  I/613  > 
I/612.  These  possibilities  are  illustrated  in  Fig.  17. 


Finally,  we  consider  the  case  7323-7313  =  0  (/.  e.,  7313  —  7323,  or  C72|3  =  t/i|3)»  in  which  both 


I/612  and  I/X12  are  infinite.  We  now  have 

J_  J_ 

613  623 


(55) 


and 


(56) 

%23  Xl3 

Together,  (55)  and  (56)  can  be  considered  either  a  special  case  of  the  inequalities  (53)  and 
(54),  if  we  take  I/612  =  +00  and  I/X12  =  —00;  or  of  the  inequalities  (48)  and  (49),  if  we 
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(a)  (b) 

Fig.  17.  Example  ideal  observer  decision  rules  for  the  case  7313  —  7323  <  0  and  I/613  >  0.  In  (a),  the  “2-v.?.-3”  line  can  lie 
anywhere  in  the  unlabeled  region;  observations  above  this  line  will  be  decided  “7r2”,  and  those  below  this  line  will  be  decided 
“7T3”.  In  (b),  the  “2-vs.-3”  line  can  lie  anywhere  between  the  “l-vs.-2”  and  “l-vs.-3”  lines  (provided  it  shares  their  intersection 
point);  note  that  observations  in  this  region  will  be  decided  “m”  regardless  of  the  position  of  this  line. 


take  I/612  =  -00  and  X12  =  +00.  This  situation,  for  the  slightly  different  cases  I/X23  <  0  and 
I/X23  >  0,  is  illustrated  in  Fig.  18. 

Notice  that  every  figure  in  this  section  has  one  or  more  corresponding  figures  in  Sec.  IV 
(depending  on  the  possible  values  of  the  undetermined  decision  boundary  parameter  being 
illustrated  in  that  figure).  Specifically, 


Fig.  14(a) 

=4> 

Figs.  4(a),  8(a),  7(a) 

Fig.  14(b) 

=>• 

Fig.  7(b) 

Fig.  15(a) 

Figs.  4(a),  8(a),  6(a) 

Fig.  15(b) 

=*> 

Figs.  4(b),  6(b),  8(a) 

Fig.  16(a) 

Figs.  6(a),  7(a),  8(b) 

Fig.  16(b) 

Fig.  7(b) 

Fig.  17(a) 

=)> 

Fig.  5(a) 

Fig.  17(b) 

=4- 

Fig.  5(b) 

Fig.  18(a) 

Figs.  5(a),  8(b),  7(a),  7(b) 

Fig.  18(b) 

Figs.  5(a),  8(b),  6(a),  6(b) 
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(a)  (b) 

Fig.  18.  Example  ideal  observer  decision  rules  for  the  case  7313—7323  =  0.  In  (a),  the  “l-vs.-3”  line  can  lie  anywhere  between 
the  two  dashed  lines  shown  (the  region  between  the  horizontal  dashed  and  dotted  lines  is  excluded  because  l/xi3  >  0); 
observations  in  the  unlabeled  region  to  the  right  of  this  line  will  be  decided  “jri”,  and  those  to  the  left  of  this  line  will  be 
decided  “713”.  In  (b),  the  “1-VS.-3”  line  can  lie  anywhere  in  the  unlabeled  region;  observations  to  the  right  of  this  line  will  be 
decided  “tti”,  and  those  to  the  left  of  this  line  will  be  decided  “7r 3 ”. 


That  is,  none  of  the  conditions  derived  in  this  section  are  inconsistent  with  those  derived  in 
the  preceding  sections. 


VII.  Discussion  and  Conclusions 

The  repetitive  nature  of  the  algebraic  manipulations  given  in  the  preceding  sections  should  not 
be  allowed  to  distract  from  the  fundamental  point  being  made:  given  the  locations  of  two  of  the 
decision  boundary  lines,  the  location  of  the  third  is  not  completely  arbitrary.  That  is,  aside  from 
the  obvious  (given  (6)-(8))  constraint  that  the  lines  must  share  a  common  intersection  point,  it 
can  also  be  shown  that  the  slope  of  the  third  line  is  constrained  by  the  slopes  of  the  first  two. 

The  significance  of  this  result  may  be  difficult  to  appreciate  at  first  glance.  It  is  perhaps  best 
illustrated  by  comparison  with  the  two-class  classifier,  for  which  the  ROC  operating  point  coor¬ 
dinates  (e.g.,  the  true-positive  fraction  (TPF)  and  false-positive  fraction  (FPF))  are  determined 
by  a  single  decision  criterion  7,  which  is  free  to  vary  without  restriction  throughout  its  domain 
of  definition.  For  the  two-class  ideal  observer,  in  particular,  an  observation  is  decided  “positive” 
(assigned  to  the  class  tx\ )  if  LR4  >  7,  where  7  can  take  on  any  nonnegative  value.  Furthermore, 
the  FPF  and  TPF  are  related  in  a  very  simple  way  to  the  cdfs  of  LRX,  and  are  thus  monotonic 
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in  the  decision  criterion  7.  For  the  three-class  ideal  observer,  this  straightforward  relationship 
is  lost;  indeed,  Figs.  5(b),  7(b),  10(b),  12(b),  15(b),  and  17(b)  show  that  for  certain  values  of 
four  of  the  five  decision  criteria  7^,  the  misclassification  probabilities  (i.  e.,  the  ROC  operating 
point  coordinates)  can  be  independent  of  the  fifth  decision  criterion. 

More  succinctly,  the  relationship  between  the  decision  criteria  and  the  misclassification  prob¬ 
abilities  is  not  one-to-one,  as  it  is  for  the  two-class  ideal  observer.  A  correct  formulation  of 
the  misclassification  probabilities  as  functions  of  the  decision  criteria,  necessary  for  an  explicit 
calculation  of  the  ideal  observer’s  ROC  hypersurface  given  the  decision  variable  probability 
density  functions,  will  require  careful  consideration  of  this  issue  (and  no  doubt  others  yet  to  be 
investigated). 
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